dashi — a cloud-native spatial data lake, in the open¶
Apache-2.0 · Kubernetes-native · STAC + GeoParquet + COG + COPC + PMTiles · runs on a laptop · marcosci/dashi
What is dashi¶
dashi is the Japanese broth that sits under every layered dish — kombu, bonito, water. Unseen but essential. We built a spatial data lake to be exactly that: the foundation everything else rides on.
The goals were boring on purpose:
- Ingest any geodata GDAL/OGR can read — vector, raster, point cloud — without bespoke plumbing per format.
- Standardise it onto a small, well-understood set of cloud-native formats: GeoParquet for vector, COG for raster, COPC for point clouds, PMTiles for tile bundles.
- Catalog it as STAC so any client can discover what's there.
- Serve it through whichever protocol the consumer needs — analytical SQL via DuckDB, raster tiles via TiTiler, vector tiles via Martin, OGC API – Features via TiPG, point clouds via direct COPC streaming.
- Run all of that locally on a laptop in
k3d, with the same manifests deployable to any production Kubernetes cluster.
Every choice is documented as an ADR. Every decision rejects bigger, older, more enterprise-y options when they don't earn their weight: GeoServer is out, Martin is in. MinIO is out (license drift), RustFS is in. No JVM anywhere.
What's in the repo¶
docs/ architecture spec (German source-of-truth + English contributor docs)
adr/ 11 architecture decisions
poc/ a working Phase-0 PoC on local k3d
manifests/ Kubernetes manifests per component
ingest/ dashi-ingest — Python pipeline (detect → validate → transform → STAC)
py3dtiles/ COPC → 3D Tiles tileset converter
scripts/ bring up the cluster, deploy each strang, mint presigned URLs
smoke/ end-to-end acceptance checks per strang
docs/viewer/ browser-side LiDAR viewer (maplibre-gl + opengeos/maplibre-gl-lidar)
A fresh laptop can clone the repo and cd poc && make k3s-up && make smoke to land at a working data lake with the bundled NZ LiDAR + Dresden OSM sample.
Architecture in one diagram¶
GeoTIFF, Shapefile, ┌──────────────────────────────────────────────┐
GPKG, KML, LAZ, … │ Landing → Processed → Curated → Enrichment │
│ │ (zone model, RustFS S3) │
▼ └──────────────┬───────────────────────────────┘
Prefect flow │
(dashi-ingest) pgstac (STAC catalog, Postgres)
│ │
└─────────┬──────────────────┘
▼
Serving: TiTiler · DuckDB · Martin · TiPG · MapLibre+deck.gl
Spatial partitioning via H3. Single processing engine: DuckDB + GDAL/PDAL. Orchestration: Prefect 3 in-cluster. Observability: Prometheus + Grafana.
Why open-source it¶
Two reasons.
- The architecture is opinionated and we want to be told if the opinions are wrong. The point cloud serving story — keep COPC in source CRS, let
loaders.glreproject browser-side — was wrong twice before we landed on it. We expect that to keep happening for a while. - Spatial data infrastructure that can run on a small cluster, self-host, and not bill per request is undersupplied. The reference stack we've assembled (RustFS + pgstac + Martin + TiTiler + TiPG + DuckDB + Prefect) is just correct — every component is best-in-class for its zone — and shipping the wiring saves anyone else weeks.
What works today¶
- Phase-0 PoC: ingest → catalog → serve, all green
- Vector tiles via Martin (PMTiles backend)
- Raster tiles via TiTiler (custom arm64 build)
- OGC API – Features via TiPG
- Analytical SQL via DuckDB endpoint
- Point clouds: direct COPC streaming + 3D Tiles tileset (py3dtiles)
- Per-zone IAM (ingest / pipeline / serving-reader) on RustFS
- NetworkPolicies (default-deny + scoped allows)
- Prometheus + Grafana dashboards
- CI — lint + manifests + docs + viewer build
- Fully Apache-2.0
What's next¶
- TiPG promotion flow: Prefect task that auto-promotes curated GeoParquet → PostGIS, so TiPG collections track ingest in real time.
- OIDC / SSO: unify Prefect, Grafana, MinIO Console under a single sign-in (Authelia + oauth2-proxy).
- Backups / DR:
pg_dumpCronJobs → off-cluster, restore drill quarterly. - Loki + promtail: centralised logs as a Grafana datasource.
- CesiumJS / iTowns adapters: consume the existing
assets.tileset3dfrom STAC.
Backlog: docs/FEATURE-IDEAS.md. Issues + Discussions are open. PRs welcome — see CONTRIBUTING.md.
Try it¶
bash
git clone https://github.com/marcosci/dashi
cd dashi/poc
make k3s-up
make storage-deploy catalog-deploy serving-deploy ogc-deploy tipg-deploy
make rbac-bootstrap network-policies-up monitoring-up prefect-up
make ingest-sample
make smoke
That's the whole thing. ~15 minutes on a recent laptop, no API keys, no managed services.
— Marco Sciaini + Johannes Schlund