Skip to content

feat(lambda): add Lambda Cloud provider#337

Open
coygeek wants to merge 14 commits into
openclaw:mainfrom
coygeek:feat/lambda-provider
Open

feat(lambda): add Lambda Cloud provider#337
coygeek wants to merge 14 commits into
openclaw:mainfrom
coygeek:feat/lambda-provider

Conversation

@coygeek

@coygeek coygeek commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Closes #336

Summary

Adds a built-in lambda provider for Lambda Cloud On-Demand GPU instances.

This implements a direct Linux SSH lease provider with Lambda API auth, provider defaults/config/env overrides, non-mutating doctor checks, instance launch/resolve/list/stop/cleanup behavior, local claim-backed recovery, Tailscale cloud-init support, provider docs, generated provider metadata, and a guarded live smoke script.

Notes

  • Direct-only provider; no coordinator-side Lambda credentials.
  • Live Lambda mutation remains opt-in behind CRABBOX_LIVE=1, CRABBOX_LIVE_PROVIDERS=lambda, and LAMBDA_API_KEY.
  • No live Lambda resources were created during local verification because the live gates were not enabled.
  • Lambda ownership uses local Crabbox claims as the primary durable metadata source. Complete provider-side metadata is still understood for compatibility.

Verification

  • go test -race ./...
  • go vet ./...
  • go build -trimpath -o bin/crabbox ./cmd/crabbox
  • npm test --prefix worker
  • npm run format:check --prefix worker
  • npm run lint --prefix worker
  • npm run check --prefix worker
  • node scripts/check-docs-links.mjs
  • bash scripts/check-docs.sh
  • bash -n scripts/live-lambda-smoke.sh
  • node --test scripts/live-lambda-smoke.test.js
  • scripts/live-lambda-smoke.sh -> classification=environment_blocked reason=CRABBOX_LIVE_not_enabled
  • ~/.agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main -> clean

coygeek added 14 commits June 13, 2026 16:55
Add the built-in Lambda provider registration, non-secret config defaults and env overrides, config-show output, API client/types, and read-only doctor readiness checks.

Keep billable lifecycle, provider-side SSH key orchestration, docs, generated provider matrices, and live smoke scripts deferred to later Lambda plans.
Implement Lambda direct SSH lease acquisition, provider-side SSH key reconciliation, launch polling, release, cleanup, recovery claims, and local Tailscale/touch metadata handling on top of the provider foundation.

Add fake-client coverage for launch request shape, ownership filtering, key reuse/deletion, cleanup safety, ambiguous mutation recovery, and non-mutating doctor behavior.
Document the direct Lambda SSH lease provider, add it to the generated provider matrix metadata, and provide a guarded live smoke script with deterministic blocker classifications.

The smoke script remains opt-in, redacts Lambda secrets, distinguishes external account blockers from validation failures, and verifies cleanup across instances, provider SSH keys, and local testbox keys.
Teach the guarded Lambda live smoke cleanup path to recognize the backend's actual instance-not-found message so ambiguous-create recovery can prove a clean account state instead of reporting a false cleanup failure.
Keep launch-time expiry metadata on Lambda provider tags so provider-only cleanup can eventually reclaim billable orphan instances, while local claims continue to carry fresh touch state.

Decode Lambda instance type responses from both array and map-keyed API shapes so doctor can validate capacity against the current API response.
Remove unsupported launch name/tag fields from the Lambda create payload and rely on Crabbox local lease claims for ownership, list, resolve, stop, and cleanup when provider tags are not persisted.

Also allow ambiguous SSH-key create recovery to clean up by owned key name, while retaining support for complete provider-tagged Lambda instances as a compatibility cleanup path.
Use the per-lease Lambda SSH key name to match an untagged instance back to an ambiguous launch recovery claim when the launch response was lost.

This gives stop and cleanup a concrete billable-resource recovery path even without provider-side launch tags.
Invoke the acquire callback before local Lambda claim side effects so controller acknowledgement failures roll back billable resources.

Harden Lambda API error redaction for multi-line user_data and let the image-family env override clear lower-precedence exact image config.
@clawsweeper

clawsweeper Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 13, 2026, 11:26 PM ET / 03:26 UTC.

Summary
The branch adds a built-in lambda direct Linux SSH-lease provider with config/env support, API client, lifecycle/cleanup/recovery logic, docs, provider metadata, tests, and a guarded live smoke script.

Reproducibility: yes. for the PR defects: comparing the proposed request/response structs with Lambda's live OpenAPI schema shows mismatched JSON shapes. I did not run a live Lambda mutation because the PR itself reports that live gates were not enabled.

Review metrics: 2 noteworthy metrics.

  • Diff surface: 25 files, +4,439/-5. The PR spans provider code, core config, docs, metadata, tests, and a live smoke script, so maintainers should treat it as a broad new provider review.
  • Live Lambda resources: 0 created in provided proof. The PR body says live gates were not enabled, so the actual billable launch/cleanup path remains unproven.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🧂 unranked krab
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Fix the launch payload and discovery response structs to match Lambda's live OpenAPI schema.
  • [P1] Add focused tests that marshal the documented launch body and decode Region-object discovery responses.
  • Post redacted real Lambda smoke output that proves doctor, launch, run/list, stop, and cleanup, or shows a maintainer-accepted external blocker after live gates are enabled.

Proof guidance:

  • [P1] Needs real behavior proof before merge: Only fake/unit checks and an environment-blocked smoke run are shown; add redacted terminal output, logs, or a linked artifact from a real CRABBOX_LIVE=1 Lambda smoke after fixing the API shapes, then update the PR body or ask a maintainer to comment @clawsweeper re-review.

Risk before merge

  • [P1] The launch payload does not match Lambda's live OpenAPI schema for image and firewall settings, so real launches may be rejected or may ignore configured image/firewall choices.
  • [P1] The discovery structs do not match region object responses, so crabbox doctor --provider lambda can fail against the real API before validating capacity or image availability.
  • [P1] The PR adds a billable external cloud lifecycle, but the provided proof explicitly did not create, run, stop, or clean up a live Lambda instance.

Maintainer options:

  1. Fix schema mismatches and require live proof (recommended)
    Update the launch and discovery models to match Lambda's OpenAPI schema, then require redacted live smoke or real blocker output before merge.
  2. Accept as experimental after maintainer review
    Maintainers could intentionally accept the unproven provider surface, but they should do so knowing green unit tests do not prove the billable cloud lifecycle.
  3. Pause until a Lambda account can prove it
    If no funded Lambda account is available, keep or close the PR as pending live validation rather than landing an unverified provider.

Next step before merge

  • [P1] Needs human review after schema fixes because the contributor must supply real Lambda proof and maintainers must accept the new billable direct-provider surface.

Security
Cleared: The diff adds a Lambda API token path, but the reviewed code keeps the key in environment/Bearer auth, avoids persistent config/argv storage, and includes redaction in errors and smoke output.

Review findings

  • [P1] Use Lambda's documented launch payload shape — internal/providers/lambda/types.go:66-69
  • [P2] Decode Lambda discovery regions as objects — internal/providers/lambda/types.go:21-28
Review details

Best possible solution:

Fix the Lambda request/response schemas against the live OpenAPI contract, keep credentials env-only, then merge only after redacted live terminal output proves doctor, launch, run/list, stop, and cleanup behavior or maintainers explicitly accept an external-account blocker.

Do we have a high-confidence way to reproduce the issue?

Yes for the PR defects: comparing the proposed request/response structs with Lambda's live OpenAPI schema shows mismatched JSON shapes. I did not run a live Lambda mutation because the PR itself reports that live gates were not enabled.

Is this the best way to solve the issue?

No: the provider direction fits Crabbox's direct SSH-lease model, but this implementation needs schema fixes and real Lambda behavior proof before it is the maintainable merge path.

Full review comments:

  • [P1] Use Lambda's documented launch payload shape — internal/providers/lambda/types.go:66-69
    Lambda's live OpenAPI launch request accepts image: {id|family} and firewall_rulesets: [{id}], but this struct serializes top-level image_id, image_family, and firewall_ruleset_name. Because the provider sets ImageFamily by default and uses these fields for explicit image/firewall config, real launches can be rejected or silently ignore the user's settings.
    Confidence: 0.91
  • [P2] Decode Lambda discovery regions as objects — internal/providers/lambda/types.go:21-28
    The live schema returns image region and regions_with_capacity_available entries as Region objects, but these types decode them as strings. doctor --provider lambda calls ListImages for the default image family and ListInstanceTypes for capacity, so it can fail against the real API before validating the provider.
    Confidence: 0.86

Overall correctness: patch is incorrect
Overall confidence: 0.92

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against 7763eecdf759.

Label changes

Label changes:

  • add P2: This is a normal-priority new provider feature with concrete correctness blockers but no current-user regression on main.
  • add merge-risk: 🚨 other: Merging would expose a new billable external provider whose live request schema and cleanup lifecycle have not been positively proven.
  • add rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🧂 unranked krab.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: Only fake/unit checks and an environment-blocked smoke run are shown; add redacted terminal output, logs, or a linked artifact from a real CRABBOX_LIVE=1 Lambda smoke after fixing the API shapes, then update the PR body or ask a maintainer to comment @clawsweeper re-review.

Label justifications:

  • P2: This is a normal-priority new provider feature with concrete correctness blockers but no current-user regression on main.
  • merge-risk: 🚨 other: Merging would expose a new billable external provider whose live request schema and cleanup lifecycle have not been positively proven.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🧂 unranked krab.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: Only fake/unit checks and an environment-blocked smoke run are shown; add redacted terminal output, logs, or a linked artifact from a real CRABBOX_LIVE=1 Lambda smoke after fixing the API shapes, then update the PR body or ask a maintainer to comment @clawsweeper re-review.
Evidence reviewed

What I checked:

  • Repository policy applied: AGENTS.md was read fully; its provider-neutral boundary and secret-handling guidance are relevant because this PR adds a direct cloud provider and an API-token path. (AGENTS.md:1, 7763eecdf759)
  • Current main has no Lambda provider: Current main contains no internal/providers/lambda package and only an unrelated Python lambda variable in the DigitalOcean smoke script, so the requested provider is not already implemented. (7763eecdf759)
  • PR adds the new provider surface: The proposed branch registers Provider{Name: lambda} as a Linux ProviderKindSSHLease with SSH, sync, cleanup, Tailscale, and CoordinatorNever capabilities. (internal/providers/lambda/provider.go:20, 576128524f47)
  • Launch request shape mismatch: The PR serializes launch image and firewall settings as top-level image_id, image_family, and firewall_ruleset_name fields, while the live Lambda OpenAPI schema for launch expects an image object and firewall_rulesets array. (internal/providers/lambda/types.go:66, 576128524f47)
  • Discovery response shape mismatch: The PR models image regions as strings and capacity regions as string slices, but the live OpenAPI schema exposes region objects for images and regions_with_capacity_available. (internal/providers/lambda/types.go:21, 576128524f47)
  • Real behavior proof is not positive: The PR body lists unit/check commands and explicitly says no live Lambda resources were created because live gates were not enabled; the only smoke result shown is classification=environment_blocked reason=CRABBOX_LIVE_not_enabled. (576128524f47)

Likely related people:

  • coygeek: Current-main history shows recent provider additions touching config, docs, lifecycle, and smoke-test patterns; the Lambda branch follows those same provider surfaces. (role: recent provider contributor; confidence: high; commits: 159da078de76, 2ae7823156a8, 975d70260d7b; files: internal/cli/config.go, docs/providers, internal/providers)
  • steipete: Git blame/log tie the current provider interface, Linode/direct-provider baseline, shared direct backend, and recent config changes to this contributor. (role: provider framework and direct-provider area contributor; confidence: high; commits: 9e208c80cd1a, 7763eecdf759; files: internal/cli/provider_backend.go, internal/providers/linode/provider.go, internal/providers/shared/direct.go)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@coygeek coygeek marked this pull request as ready for review June 14, 2026 03:20
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 other 🚨 Merging this PR has meaningful risk outside the owned taxonomy. labels Jun 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 other 🚨 Merging this PR has meaningful risk outside the owned taxonomy. P2 Normal priority bug or improvement with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Lambda Cloud On-Demand provider

1 participant