ADR-012 — Local Kubernetes substrate: OrbStack vanilla k8s over k3d-in-docker¶
Status: Accepted (2026-04-27) Supersedes: the implicit "k3d on colima" baseline used through v0.1.0.
Context¶
Through v0.1.0 the local development substrate was k3d (k3s in Docker) on top of colima (Linux VM via lima/QEMU/VZ). That stack gave us:
- A real Kubernetes API on macOS without Docker Desktop's licence.
- The same cluster surface (server + 2 agents + serverlb) as a small production deploy.
k3d image importfor fast local image cycles.
It also gave us a daily failure mode. On macOS Tahoe / Sequoia the lima hostagent's SSH-based port forward — used for the host docker socket and for every published TCP port (k8s API server, ingress, etc.) — disconnects within seconds of any sustained workload. Symptoms observed during the v0.1.1 cycle:
docker psreturns "Cannot connect to the Docker daemon" minutes aftercolima startreports success.kubectlconnection refused on the published API server port, even though the in-VMdockerdis healthy.colima restartsucceeds, then drops again. The VZ disk file acquires a stale kernel-level lock that does not release until the Mac is rebooted.- Errors in
~/.colima/_lima/colima/ha.stderr.log:failed to run [ssh ...]: exit status 255
Workarounds attempted and discarded:
- Manual
ssh -fN -Lagainst lima's published SSH port, with watchdog respawn. Holds for ~30s then dies for the same reason as the built-in forward. - Patching
port-forward-all.shto retry the precheck five times. Mitigates the symptom but does not address the underlying flap. - Killing every
limactl/colimaprocess and clearing~/.colima/_lima/colima/{ssh,ha}.sock. macOS holds the disk via the kernel VZ subsystem; reboot required.
The friction made the substrate fundamentally unreliable for daily work and impossible to recommend to a second adopter.
Decision¶
Adopt OrbStack's built-in vanilla Kubernetes as the default local development substrate on macOS. Drop the k3d-in-docker layer.
```toml
OrbStack settings (already on disk; no extra config)¶
k8s.enable = true ```
Result:
- Single layer of virtualisation (OrbStack VM → kubelet directly), no nested k3d-in-docker.
- Docker socket via
~/.orbstack/run/docker.sock, not SSH-forwarded. - Kubernetes API at
https://localhost:<random>reachable through the same shared docker daemon — no port-forward flapping. - Locally-built
dashi/<svc>:devimages are visible to OrbStack k8s without an explicit image-import step (shared containerd). The k3d-specifick3d image importcalls inserving-deploy.shandweb-ingest-deploy.share now gated behind akubectl config current-context | grep ^k3d-test and skipped on OrbStack.
The k3d path is preserved for anyone running on Linux without
OrbStack, and for the kind-based CI E2E job — both are real
non-OrbStack contexts.
Consequences¶
Positive:
- Cluster stays Running across active workloads. The
port-forward-all.sh"skip — svc/X not in dashi-Y" failure mode observed in v0.1.0 is gone. make redeploy-allreaches a greendashictl doctorend-to-end on a fresh machine without manual intervention.- The substrate matches what an external adopter likely already has installed (OrbStack is the de facto Docker Desktop alternative on macOS in 2026).
Neutral:
- OrbStack is closed-source with a paid commercial tier (free for personal / open-source). Linux developers and CI use kind, so the project is not exclusively dependent on OrbStack.
- Image import semantics differ: OrbStack k8s shares the host docker
daemon, k3d does not. Two scripts (
serving-deploy.sh,web-ingest-deploy.sh) now branch on the active kubectl context.
Negative:
- One extra branch in two deploy scripts. Tested in CI via the
e2e-clusterjob (uses kind — the non-k3d path). - Documentation now distinguishes between three local substrates (OrbStack k8s, kind, k3d). Acceptable cost for the stability gain.
Alternatives considered¶
- Docker Desktop + k3d. Same SSH-forwarding issues are reported by Docker Desktop users on Apple silicon as of 2026. Not a real improvement.
- Rancher Desktop with native k3s. Stable, but adds yet another hypervisor on the developer's machine. OrbStack is already widely installed.
- Stay on colima + reboot daily. Untenable for a project that wants a second adopter. The reboot ritual is exactly the friction v0.1.1 set out to eliminate.
- Lima native (no colima). Would still use the same SSH-forward primitive, same flap.
- Cloud dev environments (GitHub Codespaces, devcontainers in Codespaces with k3d). Out of scope for a local-first PoC; revisit when the project graduates beyond solo development.
References¶
poc/scripts/serving-deploy.sh— context-conditional k3d import.poc/scripts/web-ingest-deploy.sh— same pattern.poc/dashictl/src/commands/doctor.rs— preflight check matrix that catches OrbStack-vs-k3d configuration mismatches..github/workflows/ci.yml—e2e-clusterjob uses kind, exercises the non-OrbStack path.