feat(firecracker): add direct microVM provider#358
Conversation
Expose the first Firecracker provider contract so Crabbox can advertise the direct microVM surface, configure it consistently, and report actionable host readiness before lifecycle support lands. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Document the current Firecracker provider surface around config, doctor, and host prerequisites while lifecycle work is still pending. Add a read-only readiness smoke helper with tests so Linux and KVM blockers are classified honestly. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Replace the placeholder Firecracker lifecycle with state-backed acquire, resolve, list, release, and cleanup flows behind the existing provider contract. Add rootfs and cloud-init artifact management, Linux-specific VM launchers, and lifecycle regression tests for rollback and retained-artifact behavior. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Build the SDK Firecracker command with the same launch context used for VM startup so timeout cancellation can terminate the child process before PID capture is available. Extend the launch-timeout regression test to verify both machine construction and startup receive the bounded context.
Set the default Firecracker kernel command line to mount the first drive as a writable /dev/vda root filesystem so standard guests can boot and Crabbox can bootstrap them. Document the guest-image expectation and assert the launch config carries the writable root device contract.
Attach the writable rootfs before the cloud-init drive so the default root=/dev/vda kernel args mount the intended filesystem. Also expand user-home paths for Firecracker binary and jailer flags to match the rest of the provider path handling.
Separate Firecracker's long-lived process context from the launch timeout so successful leases are not canceled as acquire returns. Cancel the process context only on startup failure, launch timeout, or rollback, with regression coverage for both successful acquire and timeout cleanup.
Always signal a recorded Firecracker PID during acquire rollback before removing network and state artifacts, even when SDK StopVMM reports success. Extend the SSH-readiness rollback regression to assert the recorded process receives SIGTERM and is no longer considered alive.
Skip network teardown for retained Firecracker records that have already been released and no longer track a live PID, while preserving active-release and rollback cleanup. Extend retained-artifact coverage to prove later cleanup removes state without repeating network cleanup.
|
Codex review: needs real behavior proof before merge. Reviewed June 15, 2026, 4:04 AM ET / 08:04 UTC. Summary Reproducibility: not applicable. as a feature PR. The useful validation path is a real Linux/KVM Firecracker warmup/run/stop smoke, and the current PR evidence only reports an environment-blocked host check. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the provider only after maintainers accept the built-in Firecracker scope and the branch shows redacted successful Linux/KVM warmup, run or SSH, stop, and cleanup proof. Do we have a high-confidence way to reproduce the issue? Not applicable as a feature PR. The useful validation path is a real Linux/KVM Firecracker warmup/run/stop smoke, and the current PR evidence only reports an environment-blocked host check. Is this the best way to solve the issue? Unclear until real-host proof is added. The implementation follows the existing SSH-lease provider shape and trust-gates host paths, but a first-class local microVM provider needs maintainer scope approval and successful runtime evidence. AGENTS.md: found and applied where relevant. Codex review notes: model internal, reasoning high; reviewed against e00450d6e5fb. Label changesLabel changes:
Label justifications:
Evidence reviewedWhat I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
@clawsweeper re-review The prior review timed out before producing findings. Please review the current head. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
Closes #359
Summary
Adds a first-class
firecrackerprovider that provisions direct local/self-hosted Firecracker microVMs as normal Crabbox SSH leases.The provider includes:
doctor --provider firecracker --jsonreadiness checksReview Fixes
Structured review found and the branch fixes the following lifecycle and config issues:
/dev/vdarootfs~paths expand for Firecracker binary and jailer flagsVerification
gofmt -w $(git ls-files '*.go')git diff --checkgo vet ./...go test ./internal/providers/firecrackergo test ./internal/cli -run 'Test.*Firecracker|TestProvidersJSONIncludesBuiltIns|TestLoadConfigFirecracker'go test -race ./...go build -trimpath -o bin/crabbox ./cmd/crabboxnode scripts/check-provider-matrix.mjsnode scripts/check-docs-links.mjsnode scripts/check-command-docs.mjsnode --test scripts/live-firecracker-smoke.test.jsCRABBOX_BIN=./bin/crabbox scripts/live-firecracker-smoke.sh --dry-runCRABBOX_BIN=./bin/crabbox scripts/live-firecracker-smoke.sh~/.agents/skills/autoreview/scripts/autoreview --mode branch --base origin/mainThe live Firecracker smoke script was executed on a non-Linux/non-KVM host and correctly classified the result as
environment_blockedwith blocked host, binary, kernel, rootfs, and network checks. The dry-run and readiness-contract smoke tests passed.Autoreview result: clean, no accepted/actionable findings reported.