Skip to content

fix: fail sandbox spawn when secrets cannot be loaded#562

Merged
ColeMurray merged 2 commits intomainfrom
fix/error-on-repo-secrets-failure
Apr 26, 2026
Merged

fix: fail sandbox spawn when secrets cannot be loaded#562
ColeMurray merged 2 commits intomainfrom
fix/error-on-repo-secrets-failure

Conversation

@ColeMurray
Copy link
Copy Markdown
Owner

@ColeMurray ColeMurray commented Apr 26, 2026

Summary

  • Remove try/catch wrappers around global and repo secret loading in getUserEnvVars() so failures propagate as errors instead of being silently swallowed
  • Previously, a failure to load secrets would log a warning and proceed without them — spawning a sandbox missing user-configured secrets (e.g. OPENAI_API_KEY)
  • Now, if secret loading fails, the error bubbles up to the lifecycle manager which already handles spawn/restore failures with proper status updates and client notifications

Context

The original fallback behavior was added when D1 secrets were first introduced, as a safety net during rollout. Now that secrets are a core part of sandbox operation, silently dropping them can cause confusing downstream failures (agent can't authenticate to APIs, etc.). Failing fast gives clear signal.

Test plan

  • Typecheck passes
  • All 1003 unit tests pass
  • All 327 integration tests pass
  • Verify in staging that a sandbox spawns correctly when secrets load successfully
  • Verify that a D1/encryption failure surfaces as a spawn error to the user (not a silent degradation)

Summary by CodeRabbit

  • Bug Fixes

    • Secrets loading now fails explicitly when credentials cannot be retrieved, preventing operations with incomplete configuration.
  • Tests

    • Added coverage for failure during secret retrieval: ensure sandbox creation aborts, a failed status is recorded, and internal spawn state is reset after the error.

Previously, failures to load global or repo secrets were caught and
swallowed with a warning, allowing the sandbox to proceed without
user-configured secrets. This is no longer acceptable — missing
secrets should prevent the sandbox from spawning.

Remove the try/catch wrappers in getUserEnvVars() so errors propagate
to the caller (lifecycle manager), which already handles spawn failures.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 26, 2026

📝 Walkthrough

Walkthrough

getUserEnvVars now fails hard when global or repo secret retrieval errors occur (previous silent fallbacks removed). A new test covers a spawn path where storage.getUserEnvVars throws, asserting abort behavior and cleanup of spawn state.

Changes

Cohort / File(s) Summary
Durable Object — secret loading
packages/control-plane/src/session/durable-object.ts
Removed try/catch and empty-object fallbacks around GlobalSecretsStore.getDecryptedSecrets() and RepoSecretsStore.getDecryptedSecrets(repoId) in getUserEnvVars, so secret load errors propagate instead of being swallowed.
Sandbox lifecycle tests
packages/control-plane/src/sandbox/lifecycle/manager.test.ts
Added a test for spawnSandbox failure when storage.getUserEnvVars throws, asserting spawn abort, storage status updated to failed, and internal spawning flag reset.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped where secrets used to hide,
Now errors shout, no gentle slide.
Spawn aborted, flags set true,
Clearer paths for me and you.
A rabbit laughs — strict code anew. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: fail sandbox spawn when secrets cannot be loaded' accurately and concisely describes the main change: converting secret loading from fail-safe (with fallback) to fail-fast behavior, preventing sandbox spawn on secret retrieval errors.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/error-on-repo-secrets-failure

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

Terraform Validation Results

Step Status
Format
Init
Validate

Note: Terraform plan was skipped because secrets are not configured. This is expected for external contributors. See docs/GETTING_STARTED.md for setup instructions.

Pushed by: @ColeMurray, Action: pull_request

Comment thread packages/control-plane/src/session/durable-object.ts
open-inspect[bot]
open-inspect Bot previously requested changes Apr 26, 2026
Copy link
Copy Markdown
Contributor

@open-inspect open-inspect Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR changes getUserEnvVars() to fail fast when secret loading fails instead of silently continuing without secrets. The intent makes sense, but the current implementation also turns ensureRepoId() into a hard dependency for sandbox spawn, which introduces a regression for sessions that still start with repo_id = null.

PR Title and number: fix: fail sandbox spawn when secrets cannot be loaded (#562)
Author: @ColeMurray
Files changed: 1, additions/deletions: +6/-22

Critical Issues

  • [Functionality] packages/control-plane/src/session/durable-object.ts:1694 - await this.ensureRepoId(session) is now inside the fail-fast path. User-created sessions can still be initialized without repo_id, so sandbox spawn now depends on a live SCM lookup even when there are no repo-specific secrets to load. A transient GitHub/GitLab lookup failure will now fail sandbox creation for those sessions, which is broader than the PR goal of failing only when secret loading itself breaks. Suggested fix: either guarantee repo_id is always populated before spawn, or only make the actual D1/decryption reads fail-fast once a repo ID is already known.

Suggestions

  • [Testing] packages/control-plane/src/session/durable-object.ts:1689 - Add a regression test covering the new error path through the lifecycle manager (for example, getUserEnvVars() rejecting during spawn/restore) so this fail-fast behavior stays intentional and visible.

Nitpicks

  • None.

Positive Feedback

  • The change is small and easy to reason about.
  • Letting real secret decryption failures propagate to the lifecycle manager is cleaner than silently degrading the sandbox environment.
  • The existing spawn failure handling path already gives this change a natural place to surface errors to clients.

Questions

  • None.

Verdict (for GitHub PRs)

Request Changes: Critical issue must be addressed before merging.

Validation: src/sandbox/lifecycle/manager.test.ts passes (47 tests).

Address review feedback:
- Keep ensureRepoId() in the fail-fast path — if we can't resolve
  the repo ID, repo secrets would be silently dropped, defeating the
  purpose of this change. Failing here is the correct behavior.
- Add regression test verifying that getUserEnvVars() rejection
  during spawn correctly fails the sandbox and resets isSpawning.
@github-actions
Copy link
Copy Markdown

Terraform Validation Results

Step Status
Format
Init
Validate

Note: Terraform plan was skipped because secrets are not configured. This is expected for external contributors. See docs/GETTING_STARTED.md for setup instructions.

Pushed by: @ColeMurray, Action: pull_request

@ColeMurray
Copy link
Copy Markdown
Owner Author

Re: testing suggestion from the review — done in e96fbec. Added a "fails spawn when getUserEnvVars rejects" test that verifies when getUserEnvVars() rejects, the provider is not called, sandbox status is set to failed, and the isSpawning flag resets correctly.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/control-plane/src/sandbox/lifecycle/manager.test.ts (1)

682-707: Good regression coverage — consider also asserting setLastSpawnError.

The test correctly verifies the three key invariants: provider sandbox creation is skipped, sandbox status transitions to failed, and isSpawning() is reset. Optionally, you could also assert that setLastSpawnError was invoked with the underlying error message so a future change that swallows the message in doSpawn's catch block (and re-broadcasts something generic) would still be caught here:

♻️ Optional additional assertion
       expect(provider.createSandbox).not.toHaveBeenCalled();
       expect(storage.calls).toContain("updateSandboxStatus:failed");
+      expect(storage.calls).toContain("setLastSpawnError:D1 decryption failure");
       expect(manager.isSpawning()).toBe(false);

This would also pin down that the user-facing last_spawn_error correctly surfaces the secret-loading failure — which is the behavior this PR is explicitly trying to enable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/control-plane/src/sandbox/lifecycle/manager.test.ts` around lines
682 - 707, Add an assertion that setLastSpawnError was called with the
underlying error message when getUserEnvVars rejects: after awaiting
manager.spawnSandbox(), assert that storage.setLastSpawnError (from
createMockStorage) was invoked and that its argument includes "D1 decryption
failure" (the error thrown in the mocked getUserEnvVars), so the test checks
setLastSpawnError alongside provider.createSandbox, updateSandboxStatus:failed,
and manager.isSpawning().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/control-plane/src/sandbox/lifecycle/manager.test.ts`:
- Around line 682-707: Add an assertion that setLastSpawnError was called with
the underlying error message when getUserEnvVars rejects: after awaiting
manager.spawnSandbox(), assert that storage.setLastSpawnError (from
createMockStorage) was invoked and that its argument includes "D1 decryption
failure" (the error thrown in the mocked getUserEnvVars), so the test checks
setLastSpawnError alongside provider.createSandbox, updateSandboxStatus:failed,
and manager.isSpawning().

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 05426107-6914-40df-a347-07cc991b2810

📥 Commits

Reviewing files that changed from the base of the PR and between 4f74f41 and e96fbec.

📒 Files selected for processing (2)
  • packages/control-plane/src/sandbox/lifecycle/manager.test.ts
  • packages/control-plane/src/session/durable-object.ts

@ColeMurray ColeMurray merged commit 8bee466 into main Apr 26, 2026
18 checks passed
@ColeMurray ColeMurray deleted the fix/error-on-repo-secrets-failure branch April 26, 2026 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant