Skip to content

feat(firecracker): add direct microVM provider#358

Open
coygeek wants to merge 19 commits into
openclaw:mainfrom
coygeek:firecracker-provider
Open

feat(firecracker): add direct microVM provider#358
coygeek wants to merge 19 commits into
openclaw:mainfrom
coygeek:firecracker-provider

Conversation

@coygeek

@coygeek coygeek commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Closes #359

Summary

Adds a first-class firecracker provider that provisions direct local/self-hosted Firecracker microVMs as normal Crabbox SSH leases.

The provider includes:

  • provider registration, metadata, flags, env/config handling, and trust-gating for host execution paths
  • Linux-only Firecracker lifecycle using the Go SDK, per-lease state, SSH keys, rootfs copies, cloud-init metadata, CNI networking, resolve/list/release/cleanup, and rollback cleanup
  • structured doctor --provider firecracker --json readiness checks
  • provider docs, provider matrix updates, command docs, and a host-gated live smoke helper

Review Fixes

Structured review found and the branch fixes the following lifecycle and config issues:

  • acquire rollback now cleans network artifacts and terminates recorded Firecracker processes
  • KVM/CNI/jailer readiness checks are stricter and actionable
  • VM launch startup is bounded without canceling successful leases
  • Firecracker host execution paths are trust-gated for untrusted repo-local config
  • persisted state records use the effective SSH user
  • default kernel args boot from a writable /dev/vda rootfs
  • rootfs is attached before the cloud-init drive
  • ~ paths expand for Firecracker binary and jailer flags
  • retained-release cleanup is idempotent

Verification

  • gofmt -w $(git ls-files '*.go')
  • git diff --check
  • go vet ./...
  • go test ./internal/providers/firecracker
  • go test ./internal/cli -run 'Test.*Firecracker|TestProvidersJSONIncludesBuiltIns|TestLoadConfigFirecracker'
  • go test -race ./...
  • go build -trimpath -o bin/crabbox ./cmd/crabbox
  • node scripts/check-provider-matrix.mjs
  • node scripts/check-docs-links.mjs
  • node scripts/check-command-docs.mjs
  • node --test scripts/live-firecracker-smoke.test.js
  • CRABBOX_BIN=./bin/crabbox scripts/live-firecracker-smoke.sh --dry-run
  • CRABBOX_BIN=./bin/crabbox scripts/live-firecracker-smoke.sh
  • ~/.agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main

The live Firecracker smoke script was executed on a non-Linux/non-KVM host and correctly classified the result as environment_blocked with blocked host, binary, kernel, rootfs, and network checks. The dry-run and readiness-contract smoke tests passed.

Autoreview result: clean, no accepted/actionable findings reported.

coygeek and others added 19 commits June 14, 2026 13:49
Expose the first Firecracker provider contract so Crabbox can advertise the direct microVM surface, configure it consistently, and report actionable host readiness before lifecycle support lands.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Document the current Firecracker provider surface around config, doctor, and host prerequisites while lifecycle work is still pending. Add a read-only readiness smoke helper with tests so Linux and KVM blockers are classified honestly.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Replace the placeholder Firecracker lifecycle with state-backed acquire,
resolve, list, release, and cleanup flows behind the existing provider
contract. Add rootfs and cloud-init artifact management, Linux-specific VM
launchers, and lifecycle regression tests for rollback and retained-artifact
behavior.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Build the SDK Firecracker command with the same launch context used for VM startup so timeout cancellation can terminate the child process before PID capture is available.

Extend the launch-timeout regression test to verify both machine construction and startup receive the bounded context.
Set the default Firecracker kernel command line to mount the first drive as a writable /dev/vda root filesystem so standard guests can boot and Crabbox can bootstrap them.

Document the guest-image expectation and assert the launch config carries the writable root device contract.
Attach the writable rootfs before the cloud-init drive so the default root=/dev/vda kernel args mount the intended filesystem.

Also expand user-home paths for Firecracker binary and jailer flags to match the rest of the provider path handling.
Separate Firecracker's long-lived process context from the launch timeout so successful leases are not canceled as acquire returns.

Cancel the process context only on startup failure, launch timeout, or rollback, with regression coverage for both successful acquire and timeout cleanup.
Always signal a recorded Firecracker PID during acquire rollback before removing network and state artifacts, even when SDK StopVMM reports success.

Extend the SSH-readiness rollback regression to assert the recorded process receives SIGTERM and is no longer considered alive.
Skip network teardown for retained Firecracker records that have already been released and no longer track a live PID, while preserving active-release and rollback cleanup.

Extend retained-artifact coverage to prove later cleanup removes state without repeating network cleanup.
@clawsweeper

clawsweeper Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 15, 2026, 4:04 AM ET / 08:04 UTC.

Summary
Adds a built-in Firecracker SSH-lease provider with CLI config and flags, doctor checks, Linux microVM lifecycle code, docs, provider metadata, tests, and a host-gated smoke helper.

Reproducibility: not applicable. as a feature PR. The useful validation path is a real Linux/KVM Firecracker warmup/run/stop smoke, and the current PR evidence only reports an environment-blocked host check.

Review metrics: 2 noteworthy metrics.

  • Diff size: 35 files, +5375/-7. The PR is a broad provider addition, so maintainers should expect provider, config, docs, dependency, and lifecycle review rather than a narrow fix.
  • Runtime dependencies: 3 direct Go modules added. The new Firecracker, CNI, and logging dependencies expand the runtime and supply-chain surface for the CLI.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🐚 platinum hermit
Result: blocked until stronger real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add redacted terminal output, logs, or a recording from a Linux/KVM Firecracker host showing warmup, run or SSH, stop, and cleanup.
  • [P1] Redact private IPs, non-public endpoints, usernames where needed, API keys, and local paths before posting proof; updating the PR body should trigger a fresh ClawSweeper review.

Proof guidance:

  • [P1] Needs stronger real behavior proof before merge: The PR body lists tests and a non-KVM environment_blocked smoke result, but it does not show a successful Firecracker lease in a real Linux/KVM setup. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • [P1] The PR adds trusted local host execution paths, KVM access, CNI plugin execution, and local process signaling, which green unit tests do not fully settle.
  • [P1] The only reported live smoke is an honest environment_blocked result on a non-KVM host, so successful lease creation, SSH readiness, release, and cleanup remain unproven in a real Firecracker environment.
  • [P1] The new provider is a first-class built-in config and docs surface, so maintainers need to accept that product scope before merge.

Maintainer options:

  1. Require successful Firecracker host proof (recommended)
    Ask for redacted terminal output, logs, or a recording from a real Linux/KVM Firecracker host showing lease creation, SSH/run readiness, release, and cleanup before merge.
  2. Accept the experimental provider risk
    Maintainers may intentionally merge with the environment-blocked proof if they are comfortable owning first real-host validation after merge.
  3. Pause until a KVM host is available
    If no reviewer or contributor can provide a suitable Firecracker host, pause this PR rather than landing a direct lifecycle provider without successful runtime evidence.

Next step before merge

  • [P1] The blocker is contributor proof and maintainer approval for a new built-in local virtualization provider, not a narrow automated repair.

Security
Cleared: No concrete security or supply-chain defect was found in the diff, but the local host execution and CNI surface remains a merge-risk area for maintainer review.

Review details

Best possible solution:

Land the provider only after maintainers accept the built-in Firecracker scope and the branch shows redacted successful Linux/KVM warmup, run or SSH, stop, and cleanup proof.

Do we have a high-confidence way to reproduce the issue?

Not applicable as a feature PR. The useful validation path is a real Linux/KVM Firecracker warmup/run/stop smoke, and the current PR evidence only reports an environment-blocked host check.

Is this the best way to solve the issue?

Unclear until real-host proof is added. The implementation follows the existing SSH-lease provider shape and trust-gates host paths, but a first-class local microVM provider needs maintainer scope approval and successful runtime evidence.

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against e00450d6e5fb.

Label changes

Label changes:

  • add P2: This is a normal-priority user-visible feature with a broad but bounded provider surface.
  • add merge-risk: 🚨 security-boundary: The PR introduces local KVM, Firecracker binary, kernel/rootfs, CNI plugin, and process-control behavior behind a built-in provider.
  • add merge-risk: 🚨 availability: Incorrect lifecycle handling could leave local microVM, network, process, or state artifacts behind on operator hosts.
  • add rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🐚 platinum hermit.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR body lists tests and a non-KVM environment_blocked smoke result, but it does not show a successful Firecracker lease in a real Linux/KVM setup. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
  • remove rating: 🌊 off-meta tidepool: Current PR rating is rating: 🧂 unranked krab, so this older rating label is no longer current.

Label justifications:

  • P2: This is a normal-priority user-visible feature with a broad but bounded provider surface.
  • merge-risk: 🚨 security-boundary: The PR introduces local KVM, Firecracker binary, kernel/rootfs, CNI plugin, and process-control behavior behind a built-in provider.
  • merge-risk: 🚨 availability: Incorrect lifecycle handling could leave local microVM, network, process, or state artifacts behind on operator hosts.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🐚 platinum hermit.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR body lists tests and a non-KVM environment_blocked smoke result, but it does not show a successful Firecracker lease in a real Linux/KVM setup. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

What I checked:

  • Current-main gap: Current main registers adjacent providers such as e2b, incus, proxmox, tensorlake, and xcp-ng, but no firecracker provider is registered on main. (internal/providers/all/all.go:20, e00450d6e5fb)
  • Provider registration added by PR: The PR adds a Provider implementation named firecracker with ssh-lease kind, Linux target, cleanup feature, and coordinator=never. (internal/providers/firecracker/provider.go:17, 47db2019dda2)
  • Lifecycle surface added by PR: The PR adds acquire, resolve, list, release, cleanup, touch, and doctor behavior for Firecracker state, process, SSH, and network lifecycle. (internal/providers/firecracker/backend.go:163, 47db2019dda2)
  • Host execution trust gate: Firecracker binary, jailer, kernel, rootfs, CNI network, and CNI path fields from repo-local config are only applied when the config source is trusted. (internal/cli/config.go:3989, 47db2019dda2)
  • Proof is not sufficient for merge gate: The PR body lists tests and says the live smoke ran on a non-Linux/non-KVM host and correctly classified environment_blocked; the body and comments do not include a successful real Firecracker warmup/run/stop proof on a Linux/KVM host. (47db2019dda2)
  • Whitespace check: The PR diff passes git diff --check with no reported whitespace errors. (47db2019dda2)

Likely related people:

  • Peter Steinberger: Current-main blame and log for central config/provider files point to the release commit that most recently touched this area in the available local history. (role: recent area contributor; confidence: medium; commits: f6b4a9765285; files: internal/cli/config.go, internal/providers/all/all.go, docs/providers/README.md)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added the rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. label Jun 14, 2026

Copy link
Copy Markdown
Contributor

@clawsweeper re-review

The prior review timed out before producing findings. Please review the current head.

@clawsweeper

clawsweeper Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. merge-risk: 🚨 availability 🚨 Merging this PR could cause crashes, hangs, restart loops, stalls, or process outages. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 availability 🚨 Merging this PR could cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. P2 Normal priority bug or improvement with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add first-class Firecracker microVM provider support

2 participants