fix: fail sandbox spawn when secrets cannot be loaded by ColeMurray · Pull Request #562 · ColeMurray/background-agents

ColeMurray · 2026-04-26T06:11:25Z

Summary

Remove try/catch wrappers around global and repo secret loading in getUserEnvVars() so failures propagate as errors instead of being silently swallowed
Previously, a failure to load secrets would log a warning and proceed without them — spawning a sandbox missing user-configured secrets (e.g. OPENAI_API_KEY)
Now, if secret loading fails, the error bubbles up to the lifecycle manager which already handles spawn/restore failures with proper status updates and client notifications

Context

The original fallback behavior was added when D1 secrets were first introduced, as a safety net during rollout. Now that secrets are a core part of sandbox operation, silently dropping them can cause confusing downstream failures (agent can't authenticate to APIs, etc.). Failing fast gives clear signal.

Test plan

Typecheck passes
All 1003 unit tests pass
All 327 integration tests pass
Verify in staging that a sandbox spawns correctly when secrets load successfully
Verify that a D1/encryption failure surfaces as a spawn error to the user (not a silent degradation)

Summary by CodeRabbit

Bug Fixes
- Secrets loading now fails explicitly when credentials cannot be retrieved, preventing operations with incomplete configuration.
Tests
- Added coverage for failure during secret retrieval: ensure sandbox creation aborts, a failed status is recorded, and internal spawn state is reset after the error.

Previously, failures to load global or repo secrets were caught and swallowed with a warning, allowing the sandbox to proceed without user-configured secrets. This is no longer acceptable — missing secrets should prevent the sandbox from spawning. Remove the try/catch wrappers in getUserEnvVars() so errors propagate to the caller (lifecycle manager), which already handles spawn failures.

coderabbitai · 2026-04-26T06:11:34Z

📝 Walkthrough

Walkthrough

getUserEnvVars now fails hard when global or repo secret retrieval errors occur (previous silent fallbacks removed). A new test covers a spawn path where storage.getUserEnvVars throws, asserting abort behavior and cleanup of spawn state.

Changes

Cohort / File(s)	Summary
Durable Object — secret loading `packages/control-plane/src/session/durable-object.ts`	Removed try/catch and empty-object fallbacks around `GlobalSecretsStore.getDecryptedSecrets()` and `RepoSecretsStore.getDecryptedSecrets(repoId)` in `getUserEnvVars`, so secret load errors propagate instead of being swallowed.
Sandbox lifecycle tests `packages/control-plane/src/sandbox/lifecycle/manager.test.ts`	Added a test for `spawnSandbox` failure when `storage.getUserEnvVars` throws, asserting spawn abort, storage status updated to `failed`, and internal spawning flag reset.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped where secrets used to hide,
Now errors shout, no gentle slide.
Spawn aborted, flags set true,
Clearer paths for me and you.
A rabbit laughs — strict code anew. 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: fail sandbox spawn when secrets cannot be loaded' accurately and concisely describes the main change: converting secret loading from fail-safe (with fallback) to fail-fast behavior, preventing sandbox spawn on secret retrieval errors.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/error-on-repo-secrets-failure

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-26T06:11:46Z

Terraform Validation Results

Step	Status
Format	✅
Init	✅
Validate	✅

Note: Terraform plan was skipped because secrets are not configured. This is expected for external contributors. See docs/GETTING_STARTED.md for setup instructions.

Pushed by: @ColeMurray, Action: pull_request

open-inspect

Summary

This PR changes getUserEnvVars() to fail fast when secret loading fails instead of silently continuing without secrets. The intent makes sense, but the current implementation also turns ensureRepoId() into a hard dependency for sandbox spawn, which introduces a regression for sessions that still start with repo_id = null.

PR Title and number: fix: fail sandbox spawn when secrets cannot be loaded (#562)
Author: @ColeMurray
Files changed: 1, additions/deletions: +6/-22

Critical Issues

[Functionality] packages/control-plane/src/session/durable-object.ts:1694 - await this.ensureRepoId(session) is now inside the fail-fast path. User-created sessions can still be initialized without repo_id, so sandbox spawn now depends on a live SCM lookup even when there are no repo-specific secrets to load. A transient GitHub/GitLab lookup failure will now fail sandbox creation for those sessions, which is broader than the PR goal of failing only when secret loading itself breaks. Suggested fix: either guarantee repo_id is always populated before spawn, or only make the actual D1/decryption reads fail-fast once a repo ID is already known.

Suggestions

[Testing] packages/control-plane/src/session/durable-object.ts:1689 - Add a regression test covering the new error path through the lifecycle manager (for example, getUserEnvVars() rejecting during spawn/restore) so this fail-fast behavior stays intentional and visible.

Nitpicks

None.

Positive Feedback

The change is small and easy to reason about.
Letting real secret decryption failures propagate to the lifecycle manager is cleaner than silently degrading the sandbox environment.
The existing spawn failure handling path already gives this change a natural place to surface errors to clients.

Questions

None.

Verdict (for GitHub PRs)

Request Changes: Critical issue must be addressed before merging.

Validation: src/sandbox/lifecycle/manager.test.ts passes (47 tests).

Address review feedback: - Keep ensureRepoId() in the fail-fast path — if we can't resolve the repo ID, repo secrets would be silently dropped, defeating the purpose of this change. Failing here is the correct behavior. - Add regression test verifying that getUserEnvVars() rejection during spawn correctly fails the sandbox and resets isSpawning.

github-actions · 2026-04-26T06:34:59Z

Terraform Validation Results

Step	Status
Format	✅
Init	✅
Validate	✅

Note: Terraform plan was skipped because secrets are not configured. This is expected for external contributors. See docs/GETTING_STARTED.md for setup instructions.

Pushed by: @ColeMurray, Action: pull_request

ColeMurray · 2026-04-26T06:36:27Z

Re: testing suggestion from the review — done in e96fbec. Added a "fails spawn when getUserEnvVars rejects" test that verifies when getUserEnvVars() rejects, the provider is not called, sandbox status is set to failed, and the isSpawning flag resets correctly.

coderabbitai

🧹 Nitpick comments (1)

packages/control-plane/src/sandbox/lifecycle/manager.test.ts (1)
682-707: Good regression coverage — consider also asserting setLastSpawnError.

The test correctly verifies the three key invariants: provider sandbox creation is skipped, sandbox status transitions to failed, and isSpawning() is reset. Optionally, you could also assert that setLastSpawnError was invoked with the underlying error message so a future change that swallows the message in doSpawn's catch block (and re-broadcasts something generic) would still be caught here:
♻️ Optional additional assertion
       expect(provider.createSandbox).not.toHaveBeenCalled();
       expect(storage.calls).toContain("updateSandboxStatus:failed");
+      expect(storage.calls).toContain("setLastSpawnError:D1 decryption failure");
       expect(manager.isSpawning()).toBe(false);
This would also pin down that the user-facing last_spawn_error correctly surfaces the secret-loading failure — which is the behavior this PR is explicitly trying to enable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/control-plane/src/sandbox/lifecycle/manager.test.ts` around lines
682 - 707, Add an assertion that setLastSpawnError was called with the
underlying error message when getUserEnvVars rejects: after awaiting
manager.spawnSandbox(), assert that storage.setLastSpawnError (from
createMockStorage) was invoked and that its argument includes "D1 decryption
failure" (the error thrown in the mocked getUserEnvVars), so the test checks
setLastSpawnError alongside provider.createSandbox, updateSandboxStatus:failed,
and manager.isSpawning().

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/control-plane/src/sandbox/lifecycle/manager.test.ts`:
- Around line 682-707: Add an assertion that setLastSpawnError was called with
the underlying error message when getUserEnvVars rejects: after awaiting
manager.spawnSandbox(), assert that storage.setLastSpawnError (from
createMockStorage) was invoked and that its argument includes "D1 decryption
failure" (the error thrown in the mocked getUserEnvVars), so the test checks
setLastSpawnError alongside provider.createSandbox, updateSandboxStatus:failed,
and manager.isSpawning().

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 05426107-6914-40df-a347-07cc991b2810

📥 Commits

Reviewing files that changed from the base of the PR and between 4f74f41 and e96fbec.

📒 Files selected for processing (2)

packages/control-plane/src/sandbox/lifecycle/manager.test.ts
packages/control-plane/src/session/durable-object.ts

open-inspect Bot reviewed Apr 26, 2026

View reviewed changes

Comment thread packages/control-plane/src/session/durable-object.ts

open-inspect Bot previously requested changes Apr 26, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 26, 2026

View reviewed changes

ColeMurray merged commit 8bee466 into main Apr 26, 2026
18 checks passed

ColeMurray deleted the fix/error-on-repo-secrets-failure branch April 26, 2026 06:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fail sandbox spawn when secrets cannot be loaded#562

fix: fail sandbox spawn when secrets cannot be loaded#562
ColeMurray merged 2 commits intomainfrom
fix/error-on-repo-secrets-failure

ColeMurray commented Apr 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Uh oh!

open-inspect Bot left a comment

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

ColeMurray commented Apr 26, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ColeMurray commented Apr 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

github-actions Bot commented Apr 26, 2026

Terraform Validation Results

Uh oh!

Uh oh!

open-inspect Bot left a comment

Choose a reason for hiding this comment

Summary

Critical Issues

Suggestions

Nitpicks

Positive Feedback

Questions

Verdict (for GitHub PRs)

Uh oh!

github-actions Bot commented Apr 26, 2026

Terraform Validation Results

Uh oh!

ColeMurray commented Apr 26, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ColeMurray commented Apr 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 26, 2026 •

edited

Loading