Prefect (Strang F — Pipeline Orchestrierung)¶
Minimal Prefect 3 server deployed in the dashi-data namespace. Backs ADR-010.
PoC scope¶
- Single-replica Prefect server (SQLite-backed via
emptyDirvolume — fine for demo, not durable across pod restarts) - No separate worker pod: the ingest flow runs locally against the server's API via a port-forward. Production-grade deployments would register a Kubernetes work pool and let Prefect schedule flow runs as K8s Jobs.
- No scheduler CRDs, no API auth, no TLS. All deferred to Phase 2.
Components¶
| File | Purpose |
|---|---|
namespace.yaml |
dashi-data namespace |
deployment-server.yaml |
Prefect 3.1.15 server on Python 3.12, port 4200, emptyDir volume |
service.yaml |
ClusterIP prefect-server:4200 |
kustomization.yaml |
Apply with kubectl apply -k . |
Apply¶
bash
cd poc
make prefect-up
kubectl -n dashi-data rollout status deployment/prefect-server --timeout=180s
Use¶
```bash
port-forward the API (also serves the UI on the same host:port)¶
kubectl -n dashi-data port-forward svc/prefect-server 4200:4200 &
tell the Prefect client which API to talk to¶
export PREFECT_API_URL=http://localhost:4200/api
run the dashi-ingest flow locally against the server¶
cd poc/ingest .venv/bin/pip install "prefect>=3.1,<4"
cd ../..
point dashi-ingest at its creds¶
export DASHI_S3_ACCESS_KEY=$(kubectl -n dashi-platform get secret rustfs-root -o jsonpath='{.data.access-key}' | base64 -d) export DASHI_S3_SECRET_KEY=$(kubectl -n dashi-platform get secret rustfs-root -o jsonpath='{.data.secret-key}' | base64 -d) export DASHI_S3_ENDPOINT=http://localhost:9000
run a one-shot flow¶
cd poc .venv-flows/bin/python -m flows.ingest poc/sample-data/ ```
Open the UI:
http://localhost:4200/
You'll see the flow run, the per-layer task graph, timing, logs, retries.
Production hardening deferred¶
- Durable Postgres backend (replace
emptyDir+ SQLite with Postgres fromdashi-catalogor a dedicated instance) - K8s work pool + worker Deployment so flows run as isolated Jobs
- AuthN/Z: Prefect server is currently open on the cluster network
- Persistent flow storage (
prefect.yamlcommitted to git, pulled by a runtime image) - Scheduling (cron triggers for periodic landing-zone sweeps)
- Alert integration (Slack / email / Teams)