Skip to content

security: tool_command sandboxing — 4 defense-in-depth layers#56

Merged
clintecker merged 5 commits intomainfrom
security/tool-command-sandboxing
Apr 4, 2026
Merged

security: tool_command sandboxing — 4 defense-in-depth layers#56
clintecker merged 5 commits intomainfrom
security/tool-command-sandboxing

Conversation

@clintecker
Copy link
Copy Markdown
Collaborator

@clintecker clintecker commented Apr 4, 2026

Summary

P0 critical security fix. Implements four defense-in-depth layers for tool_command execution safety, preventing LLM output injection into shell commands and protecting against malicious .dip files.

Layer 1: Safe-key allowlist (primary security boundary)

Variable expansion in tool_command now uses a fail-closed allowlist: only outcome, preferred_label, human_response, interview_answers can be interpolated from ctx.*. All LLM-origin keys (last_response, tool_stdout, response.*, etc.) are blocked with an actionable error message. human_response is classified as safe (user keyboard input, not LLM output).

Layer 2: Output size limits

Tool stdout/stderr capped at 64KB per stream (configurable via output_limit node attr). Hard ceiling 10MB (only --max-output-limit CLI can raise). Thread-safe limitedBuffer with truncation marker.

Layer 3: Command denylist/allowlist

Built-in denylist blocks dangerous patterns (eval, pipe-to-shell, curl|sh). Matching is per-statement (splits on ;, &&, ||, newlines), case-insensitive. Denylist is non-overridable by .dip files — only --bypass-denylist CLI flag can disable it. Optional allowlist via --tool-allowlist or tool_commands_allow graph attr.

Layer 4: Environment variable stripping

Sensitive env vars (*_API_KEY, *_SECRET, *_TOKEN, *_PASSWORD) stripped from tool subprocesses. Override with TRACKER_PASS_ENV=1.

CLI flags

  • --tool-allowlist — comma-separated glob patterns for allowed commands
  • --bypass-denylist — disable built-in denylist (use with caution)
  • --max-output-limit — hard ceiling for output capture

Design review

Spec reviewed by 6-expert security panel (offensive, defensive, Go security, pipeline security, usability, backward compat). Key design changes from review:

  • Inverted taint model (allowlist of safe keys, not denylist of bad keys)
  • human_response reclassified as safe (user input, not LLM)
  • Denylist non-overridable by .dip graph attrs
  • Per-statement matching instead of full-command blob
  • Env var stripping added as Layer 4

Test plan

  • TestExpandVariables_ToolCommandMode_BlocksLLMOutputlast_response blocked
  • TestExpandVariables_ToolCommandMode_AllowsHumanResponse — user input safe
  • TestExpandVariables_ToolCommandMode_BlocksResponsePrefixresponse.* blocked
  • TestExpandVariables_ToolCommandMode_AllowsGraphAndParams — author-controlled keys safe
  • TestExecCommandWithLimit_Truncates — output capped with marker
  • TestCheckCommandDenylist — 12 cases including compound commands
  • TestCheckCommandAllowlist — pattern restriction
  • TestBuildToolEnv_StripsAPIKeys — sensitive vars removed
  • TestToolHandler_BlocksTaintedVariable — integration: tainted var → error
  • TestToolHandler_AllowsSafeVariable — integration: safe var works
  • TestToolHandler_DenylistBlocks — integration: curl|sh → error
  • All 14 packages pass, race detector clean

Closes #16

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Tool subprocess output limited (default 64KB/stream; configurable; hard max 10MB)
    • Safer variable expansion for tool commands—only approved ctx.* keys allowed
    • Built-in denylist for dangerous command patterns plus optional allowlist and bypass flag
    • Sensitive env vars (API keys/tokens/passwords) filtered by default; override available
  • Tests

    • Added coverage for output limits, env filtering, variable-safety, and command-safety behaviors
  • Documentation

    • Updated guidance and recommended safe workflows for tool invocation

clintecker and others added 4 commits April 3, 2026 19:24
…cesses (#16)

Adds buildToolEnv() that filters out env vars matching patterns _API_KEY,
_SECRET, _TOKEN, _PASSWORD before executing tool node shell commands.
Override with TRACKER_PASS_ENV=1 for trusted environments.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ching (#16)

Adds tool_safety.go with CheckToolCommand, denylist/allowlist pattern matching,
and compound statement splitting. More-specific patterns (curl/wget) ordered
before generic pipe-to-shell patterns to ensure correct match attribution.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
#16)

- ExpandVariables called with toolCommandMode=true — FAIL CLOSED on unsafe ctx.* keys
- CheckToolCommand validates denylist/allowlist on the final command string
- parseByteSize helper + per-node output_limit attr parsed and capped at maxOutputLimit
- ExecCommandWithLimit + buildToolEnv() used for LocalEnvironment; fallback to ExecCommand for other envs
- NewToolHandlerWithConfig constructor for full security configuration
- Integration tests: BlocksTaintedVariable, AllowsSafeVariable, DenylistBlocks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ols (#16)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 4, 2026

Walkthrough

Implements tool-command safety: restricts LLM-driven ${ctx.*} expansions, validates commands with denylist/allowlist, limits tool stdout/stderr with configurable caps, and strips sensitive environment variables from subprocesses; integrates these checks into tool handler execution flow.

Changes

Cohort / File(s) Summary
Docs
CLAUDE.md
Expanded safety guidance: allowed ctx keys, recommended file-write workflow, default output limit (64KB), hard max (10MB), denylist/--bypass-denylist, optional allowlist flags, and env-var stripping semantics.
Variable Expansion
pipeline/expand.go, pipeline/expand_test.go
Added toolCommandMode flag to ExpandVariables to block unsafe ctx.* keys (allowlist: outcome, preferred_label, human_response, interview_answers); tests for safe/unsafe expansions and mixed namespaces.
Local Exec + Tests
agent/exec/local.go, agent/exec/env_test.go
Added limitedBuffer and ExecCommandWithLimit to cap stdout/stderr per-stream with truncation marker; supports optional env override; tests for truncation, no-truncation, and custom env.
Tool Handler Integration
pipeline/handlers/tool.go
New ToolHandlerConfig, NewToolHandlerWithConfig, default/max output limits, parsing of per-node output_limit, use of ExpandVariables(..., toolCommandMode=true), CheckToolCommand validation, buildToolEnv filtering, and invocation of ExecCommandWithLimit for local envs.
Command Safety
pipeline/handlers/tool_safety.go, pipeline/handlers/tool_safety_test.go
Added denylist (glob-like, case-insensitive), statement splitting (;, &&,
Handler Tests
pipeline/handlers/tool_test.go
Tests for buildToolEnv stripping sensitive vars and TRACKER_PASS_ENV override; handler tests asserting blocked tainted variables, allowed safe variables, and denylist blocking.

Sequence Diagram

sequenceDiagram
    participant Client as Tool Node
    participant Handler as ToolHandler.Execute
    participant Expand as ExpandVariables
    participant Validate as CheckToolCommand
    participant FilterEnv as buildToolEnv
    participant Exec as ExecCommandWithLimit

    Client->>Handler: submit tool_command + ctx
    Handler->>Expand: ExpandVariables(..., toolCommandMode=true)
    alt expansion error (unsafe ctx)
        Expand-->>Handler: error (unsafe variable)
        Handler-->>Client: fail (unsafe variable)
    else expanded
        Expand-->>Handler: expanded command
        Handler->>Validate: CheckToolCommand(command, nodeID, allowlist, bypassDenylist)
        alt validation error
            Validate-->>Handler: error (denied/unlisted)
            Handler-->>Client: fail (denied/unlisted)
        else validated
            Validate-->>Handler: ok
            Handler->>FilterEnv: buildToolEnv(osEnv)
            FilterEnv-->>Handler: filtered env
            Handler->>Exec: ExecCommandWithLimit(ctx, cmd, args, timeout, outputLimit, env)
            Exec->>Exec: run process, cap stdout/stderr, append truncation marker if needed
            Exec-->>Handler: CommandResult (capped output)
            Handler-->>Client: success/failure + limited output
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐇 I nibble bytes, I guard the gate,
I hush the secrets, limit sprawl and state,
Denylists patter, allowlists skip the weeds,
Truncated streams keep tidy little feeds,
A safer hop through shell-bound meadows bright.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding security sandboxing with four defense-in-depth layers for tool_command execution.
Linked Issues check ✅ Passed The PR fully addresses all coding requirements from issue #16: implements safe-key allowlist for variable expansion, output size limits, command denylist/allowlist validation, and environment variable stripping.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the four security layers specified in issue #16; no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch security/tool-command-sandboxing

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cff04ea9e5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +155 to +159
if len(p) > remaining {
p = p[:remaining]
lb.truncated = true
}
return lb.buf.Write(p)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Return original byte count from limitedBuffer writes

When len(p) > remaining, limitedBuffer.Write truncates p and returns lb.buf.Write(p), which reports fewer bytes than were requested. os/exec streams use io.Copy, and a short write with nil error is treated as io.ErrShortWrite, so tool commands can fail once output crosses the cap boundary instead of being cleanly truncated. This affects any command that emits enough output for a read chunk to exceed the remaining buffer space.

Useful? React with 👍 / 👎.

Comment on lines +192 to +195
if parsed > h.maxOutputLimit {
parsed = h.maxOutputLimit
}
outputLimit = parsed
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce positive output_limit before applying node override

The node-level output_limit override is only clamped on the upper bound; non-positive values are accepted and assigned directly. Because ExecCommandWithLimit treats <= 0 as unbounded, a workflow can set output_limit: 0/negative (or an overflowed negative from large MB inputs) to disable truncation entirely, defeating the intended output-cap safety control.

Useful? React with 👍 / 👎.

Comment on lines +21 to +23
"* | sh",
"* | bash",
"* | zsh",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Broaden pipe-to-shell deny patterns to include shell args

The denylist entries for pipe-to-shell are exact full-statement globs (* | sh, * | bash, etc.), so common variants like cat payload | sh -eux or ... | sh & do not match and pass validation. Since CheckToolCommand anchors matches to the entire statement, these forms bypass the denylist even when it is enabled.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
CLAUDE.md (1)

202-207: Clarify namespace distinction in safe-key documentation.

The current text lists graph.goal alongside ctx keys, which could imply it's part of the ctx allowlist. It's actually allowed because graph.* is a separate namespace that doesn't have LLM-tainted values.

Consider rewording for clarity:

-- Variable expansion in tool_command uses a safe-key allowlist: only `outcome`, `preferred_label`, `human_response`, `interview_answers`, and `graph.goal` can be interpolated. All LLM-origin keys (`last_response`, `tool_stdout`, `response.*`, etc.) are blocked.
++ Variable expansion in tool_command uses a safe-key allowlist for the `ctx` namespace: only `outcome`, `preferred_label`, `human_response`, `interview_answers` can be interpolated. The `graph.*` and `params.*` namespaces are unrestricted. All LLM-origin ctx keys (`last_response`, `tool_stdout`, `response.*`, etc.) are blocked.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` around lines 202 - 207, Update the documentation around the
tool_command safe-key allowlist to clearly separate ctx keys from the graph
namespace: remove or rephrase the implication that graph.goal is part of the
same ctx allowlist and instead state that ctx keys allowed for interpolation are
outcome, preferred_label, human_response, and interview_answers, while graph.*
(e.g., graph.goal) is a distinct namespace whose values are considered safe
because they are not LLM-origin; keep the note that LLM-origin keys
(last_response, tool_stdout, response.*, etc.) are blocked and ensure any
examples and the sentence listing safe keys explicitly indicate the two
namespaces (ctx.* vs graph.*) to avoid confusion.
agent/exec/local.go (1)

191-208: Consider extracting duplicated error handling into a helper.

The error handling logic (lines 197-206) is duplicated between the bounded and unbounded code paths (also at lines 217-225). While this is acceptable for a security-critical path where clarity matters, a small helper could reduce duplication.

♻️ Optional: Extract error handling helper
func handleExecError(err error, ctx context.Context, timeout time.Duration, result CommandResult) (CommandResult, error) {
    if err != nil {
        if ctx.Err() != nil {
            return result, fmt.Errorf("command timed out after %v", timeout)
        }
        if exitErr, ok := err.(*exec.ExitError); ok {
            result.ExitCode = exitErr.ExitCode()
            return result, nil
        }
        return result, err
    }
    return result, nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent/exec/local.go` around lines 191 - 208, Extract the duplicated error
handling in local command execution into a small helper (e.g., handleExecError)
that accepts the error returned from cmd.Run(), the current context (ctx), the
timeout, and the current CommandResult, and returns (CommandResult, error); move
the logic that checks ctx.Err(), inspects *exec.ExitError to set
result.ExitCode, and returns the appropriate result/error into that helper, then
replace the duplicated blocks in both the unbounded and bounded code paths (the
branches using cmd.Run(), stdout/stderr buffers and the CommandResult struct)
with calls to handleExecError to avoid repetition.
pipeline/handlers/tool_safety.go (1)

49-61: Minor: globMatch compiles regex on every call.

For a security-critical path, correctness over performance is the right tradeoff. However, if this becomes a hot path, consider caching compiled patterns.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/handlers/tool_safety.go` around lines 49 - 61, The globMatch
function currently recompiles the regex on every call; change it to cache
compiled regexes keyed by the lowercased pattern to avoid repeated
regexp.Compile calls. Implement a package-level cache (e.g., a sync.Map or a map
protected by a sync.RWMutex) and add a helper like getCompiledGlob(pattern
string) *regexp.Regexp that lowercases and transforms the pattern (same
QuoteMeta + ReplaceAll logic) and returns the cached *regexp.Regexp or compiles,
stores, and returns it; then modify globMatch to use getCompiledGlob and handle
nil/compile errors by returning false. Ensure the cache key and transformation
match the existing behavior (case-insensitive via strings.ToLower).
pipeline/handlers/tool.go (1)

54-55: Use suffix matching for the secret env filter.

The contract here is *_API_KEY, *_SECRET, *_TOKEN, *_PASSWORD, but strings.Contains also strips names like AUTH_TOKEN_URL or MY_SECRET_MODE. Narrowing this to suffix matching avoids surprising subprocess breakage while still covering the intended variables.

Proposed refactor
-		for _, pattern := range sensitiveEnvPatterns {
-			if strings.Contains(upper, pattern) {
+		for _, pattern := range sensitiveEnvPatterns {
+			if strings.HasSuffix(upper, pattern) {
 				sensitive = true
 				break
 			}
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/handlers/tool.go` around lines 54 - 55, The current secret env
filter loops over sensitiveEnvPatterns and uses strings.Contains(upper,
pattern), which wrongly matches variables like AUTH_TOKEN_URL; change the check
to use suffix matching (strings.HasSuffix) so only names ending with the
patterns (e.g. *_API_KEY, *_SECRET, *_TOKEN, *_PASSWORD) are filtered; update
the loop in the code that references sensitiveEnvPatterns and the variable upper
to call strings.HasSuffix(upper, pattern) instead of strings.Contains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pipeline/handlers/tool.go`:
- Around line 90-105: The code sets outputLimit from cfg.OutputLimit and
separately computes maxLimit from cfg.MaxOutputLimit but never enforces maxLimit
on outputLimit; update the initialization logic in the ToolHandler construction
(around outputLimit/maxLimit) so you first set outputLimit = cfg.OutputLimit (or
DefaultOutputLimit if <=0), set maxLimit = cfg.MaxOutputLimit (or MaxOutputLimit
if <=0), then clamp outputLimit = min(outputLimit, maxLimit) before returning
the new ToolHandler instance (references: cfg.OutputLimit, cfg.MaxOutputLimit,
DefaultOutputLimit, MaxOutputLimit, outputLimit, maxLimit, ToolHandler).
- Around line 187-196: After parsing node.Attrs["output_limit"] via
parseByteSize in the handler, validate that the parsed value is strictly
positive and return an error if parsed <= 0 (so non-positive values like "0" or
negatives are rejected); then apply the existing cap against h.maxOutputLimit
and set outputLimit. Reference parseByteSize, node.Attrs["output_limit"],
h.maxOutputLimit, outputLimit and ExecCommandWithLimit/limitedWriter to locate
where to add the parsed <= 0 check and error return.
- Around line 46-47: The current check if os.Getenv("TRACKER_PASS_ENV") != ""
allows any non-empty value (e.g., "0" or "false") to bypass env stripping;
change the condition to require the explicit value "1" (e.g.,
os.Getenv("TRACKER_PASS_ENV") == "1") before returning os.Environ() so only
TRACKER_PASS_ENV=1 disables filtering; update the branch that returns
os.Environ() and any related comments to reflect this stricter contract.
- Around line 198-205: The current else branch calls h.env.ExecCommand and thus
skips output capping and filtered env; replace that fallback with a fail-closed
check: define a local interface type that declares ExecCommandWithLimit
(matching exec.LocalEnvironment.ExecCommandWithLimit signature) and attempt a
type assertion on h.env; if it implements the limited call, invoke
ExecCommandWithLimit(ctx, "sh", []string{"-c", command}, timeout, outputLimit,
buildToolEnv()), otherwise return an error from the handler indicating the
environment does not support limited execution (do not call ExecCommand). Update
references: h.env, exec.LocalEnvironment, ExecCommandWithLimit, ExecCommand,
buildToolEnv, outputLimit, timeout.

---

Nitpick comments:
In `@agent/exec/local.go`:
- Around line 191-208: Extract the duplicated error handling in local command
execution into a small helper (e.g., handleExecError) that accepts the error
returned from cmd.Run(), the current context (ctx), the timeout, and the current
CommandResult, and returns (CommandResult, error); move the logic that checks
ctx.Err(), inspects *exec.ExitError to set result.ExitCode, and returns the
appropriate result/error into that helper, then replace the duplicated blocks in
both the unbounded and bounded code paths (the branches using cmd.Run(),
stdout/stderr buffers and the CommandResult struct) with calls to
handleExecError to avoid repetition.

In `@CLAUDE.md`:
- Around line 202-207: Update the documentation around the tool_command safe-key
allowlist to clearly separate ctx keys from the graph namespace: remove or
rephrase the implication that graph.goal is part of the same ctx allowlist and
instead state that ctx keys allowed for interpolation are outcome,
preferred_label, human_response, and interview_answers, while graph.* (e.g.,
graph.goal) is a distinct namespace whose values are considered safe because
they are not LLM-origin; keep the note that LLM-origin keys (last_response,
tool_stdout, response.*, etc.) are blocked and ensure any examples and the
sentence listing safe keys explicitly indicate the two namespaces (ctx.* vs
graph.*) to avoid confusion.

In `@pipeline/handlers/tool_safety.go`:
- Around line 49-61: The globMatch function currently recompiles the regex on
every call; change it to cache compiled regexes keyed by the lowercased pattern
to avoid repeated regexp.Compile calls. Implement a package-level cache (e.g., a
sync.Map or a map protected by a sync.RWMutex) and add a helper like
getCompiledGlob(pattern string) *regexp.Regexp that lowercases and transforms
the pattern (same QuoteMeta + ReplaceAll logic) and returns the cached
*regexp.Regexp or compiles, stores, and returns it; then modify globMatch to use
getCompiledGlob and handle nil/compile errors by returning false. Ensure the
cache key and transformation match the existing behavior (case-insensitive via
strings.ToLower).

In `@pipeline/handlers/tool.go`:
- Around line 54-55: The current secret env filter loops over
sensitiveEnvPatterns and uses strings.Contains(upper, pattern), which wrongly
matches variables like AUTH_TOKEN_URL; change the check to use suffix matching
(strings.HasSuffix) so only names ending with the patterns (e.g. *_API_KEY,
*_SECRET, *_TOKEN, *_PASSWORD) are filtered; update the loop in the code that
references sensitiveEnvPatterns and the variable upper to call
strings.HasSuffix(upper, pattern) instead of strings.Contains.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 62246804-0d06-4064-ae68-175a0d56b2e5

📥 Commits

Reviewing files that changed from the base of the PR and between 56c1454 and cff04ea.

📒 Files selected for processing (9)
  • CLAUDE.md
  • agent/exec/env_test.go
  • agent/exec/local.go
  • pipeline/expand.go
  • pipeline/expand_test.go
  • pipeline/handlers/tool.go
  • pipeline/handlers/tool_safety.go
  • pipeline/handlers/tool_safety_test.go
  • pipeline/handlers/tool_test.go

Comment on lines +187 to +196
if limitStr, ok := node.Attrs["output_limit"]; ok && limitStr != "" {
parsed, err := parseByteSize(limitStr)
if err != nil {
return pipeline.Outcome{}, fmt.Errorf("node %q has invalid output_limit %q: %w", node.ID, limitStr, err)
}
if parsed > h.maxOutputLimit {
parsed = h.maxOutputLimit
}
outputLimit = parsed
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject non-positive output_limit values.

parseByteSize("0") and negative inputs currently flow into ExecCommandWithLimit unchanged. That makes the cap depend on limitedWriter edge-case behavior instead of enforcing a positive bound here.

Proposed fix
 	if limitStr, ok := node.Attrs["output_limit"]; ok && limitStr != "" {
 		parsed, err := parseByteSize(limitStr)
 		if err != nil {
 			return pipeline.Outcome{}, fmt.Errorf("node %q has invalid output_limit %q: %w", node.ID, limitStr, err)
 		}
+		if parsed <= 0 {
+			return pipeline.Outcome{}, fmt.Errorf("node %q has invalid output_limit %q: must be > 0", node.ID, limitStr)
+		}
 		if parsed > h.maxOutputLimit {
 			parsed = h.maxOutputLimit
 		}
 		outputLimit = parsed
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/handlers/tool.go` around lines 187 - 196, After parsing
node.Attrs["output_limit"] via parseByteSize in the handler, validate that the
parsed value is strictly positive and return an error if parsed <= 0 (so
non-positive values like "0" or negatives are rejected); then apply the existing
cap against h.maxOutputLimit and set outputLimit. Reference parseByteSize,
node.Attrs["output_limit"], h.maxOutputLimit, outputLimit and
ExecCommandWithLimit/limitedWriter to locate where to add the parsed <= 0 check
and error return.

Comment on lines +198 to +205
// Layer 4: Use ExecCommandWithLimit with buildToolEnv() when running on LocalEnvironment.
// For other ExecutionEnvironment implementations (e.g. mock, remote), fall back to ExecCommand.
var result exec.CommandResult
if le, ok := h.env.(*exec.LocalEnvironment); ok {
result, err = le.ExecCommandWithLimit(ctx, "sh", []string{"-c", command}, timeout, outputLimit, buildToolEnv())
} else {
result, err = h.env.ExecCommand(ctx, "sh", []string{"-c", command}, timeout)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t silently drop the new safeguards on non-local environments.

Per agent/exec/env.go:17-25, ExecutionEnvironment does not expose a limited-exec API, so this else branch skips both output capping and filtered-env execution. agent/exec/local.go:93-137 is currently the only implementation that enforces those protections, which means SSH/container/mock environments fall back to the old behavior here.

Minimal fail-closed alternative
 	if le, ok := h.env.(*exec.LocalEnvironment); ok {
 		result, err = le.ExecCommandWithLimit(ctx, "sh", []string{"-c", command}, timeout, outputLimit, buildToolEnv())
 	} else {
-		result, err = h.env.ExecCommand(ctx, "sh", []string{"-c", command}, timeout)
+		return pipeline.Outcome{}, fmt.Errorf(
+			"node %q execution environment %T does not support limited tool_command execution",
+			node.ID, h.env,
+		)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/handlers/tool.go` around lines 198 - 205, The current else branch
calls h.env.ExecCommand and thus skips output capping and filtered env; replace
that fallback with a fail-closed check: define a local interface type that
declares ExecCommandWithLimit (matching
exec.LocalEnvironment.ExecCommandWithLimit signature) and attempt a type
assertion on h.env; if it implements the limited call, invoke
ExecCommandWithLimit(ctx, "sh", []string{"-c", command}, timeout, outputLimit,
buildToolEnv()), otherwise return an error from the handler indicating the
environment does not support limited execution (do not call ExecCommand). Update
references: h.env, exec.LocalEnvironment, ExecCommandWithLimit, ExecCommand,
buildToolEnv, outputLimit, timeout.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements defense-in-depth hardening for tool_command execution in the pipeline tool handler to reduce shell injection risk and limit damage/exfiltration from malicious .dip graphs or LLM-derived content.

Changes:

  • Adds a fail-closed allowlist for ${ctx.*} interpolation when expanding tool_command.
  • Introduces per-stream stdout/stderr output caps (default 64KB; configurable with a hard ceiling) via a new ExecCommandWithLimit.
  • Adds command denylist + optional allowlist validation and strips sensitive env vars for local tool subprocesses.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
pipeline/handlers/tool.go Wires variable-expansion allowlist, command safety checks, output limiting, and env stripping into tool execution.
pipeline/handlers/tool_test.go Adds tests for env stripping and tool handler security behavior (tainted ctx keys, denylist).
pipeline/handlers/tool_safety.go Implements denylist/allowlist matching for tool commands (per-statement).
pipeline/handlers/tool_safety_test.go Unit tests for statement splitting + denylist/allowlist behavior.
pipeline/expand.go Adds toolCommandMode to restrict ${ctx.*} expansion to a safe allowlist.
pipeline/expand_test.go Adds tests confirming tool-command mode blocks unsafe ctx keys and allows safe ones.
CLAUDE.md Documents the new tool safety controls and recommended usage patterns.
agent/exec/local.go Adds ExecCommandWithLimit and a thread-safe limited buffer to cap output capture.
agent/exec/env_test.go Adds tests validating truncation behavior and custom env support for ExecCommandWithLimit.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +116 to +124
if strings.HasSuffix(upper, "MB") {
n, err := strconv.Atoi(strings.TrimSuffix(upper, "MB"))
return n * 1024 * 1024, err
}
if strings.HasSuffix(upper, "KB") {
n, err := strconv.Atoi(strings.TrimSuffix(upper, "KB"))
return n * 1024, err
}
return strconv.Atoi(s)
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseByteSize + output_limit handling allows negative values (e.g. "-1", "-1MB"), which later makes outputLimit <= 0 and disables output limiting in ExecCommandWithLimit. This undermines the output-cap security layer; treat non-positive parsed limits as invalid and return an error (or clamp to a minimum of 1). Also consider using int64 and validating before multiplying to avoid overflow on 32-bit platforms.

Suggested change
if strings.HasSuffix(upper, "MB") {
n, err := strconv.Atoi(strings.TrimSuffix(upper, "MB"))
return n * 1024 * 1024, err
}
if strings.HasSuffix(upper, "KB") {
n, err := strconv.Atoi(strings.TrimSuffix(upper, "KB"))
return n * 1024, err
}
return strconv.Atoi(s)
maxInt := int64(^uint(0) >> 1)
parseWithMultiplier := func(raw string, multiplier int64) (int, error) {
n, err := strconv.ParseInt(strings.TrimSpace(raw), 10, 64)
if err != nil {
return 0, err
}
if n <= 0 {
return 0, fmt.Errorf("byte size must be greater than zero")
}
if n > maxInt/multiplier {
return 0, fmt.Errorf("byte size %q overflows", s)
}
total := n * multiplier
if total > maxInt {
return 0, fmt.Errorf("byte size %q overflows", s)
}
return int(total), nil
}
if strings.HasSuffix(upper, "MB") {
return parseWithMultiplier(strings.TrimSuffix(upper, "MB"), 1024*1024)
}
if strings.HasSuffix(upper, "KB") {
return parseWithMultiplier(strings.TrimSuffix(upper, "KB"), 1024)
}
return parseWithMultiplier(s, 1)

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +105
// NewToolHandlerWithConfig creates a ToolHandler with full security configuration.
func NewToolHandlerWithConfig(env exec.ExecutionEnvironment, cfg ToolHandlerConfig) *ToolHandler {
outputLimit := cfg.OutputLimit
if outputLimit <= 0 {
outputLimit = DefaultOutputLimit
}
maxLimit := cfg.MaxOutputLimit
if maxLimit <= 0 {
maxLimit = MaxOutputLimit
}
return &ToolHandler{
env: env,
defaultTimeout: defaultToolTimeout,
outputLimit: outputLimit,
maxOutputLimit: maxLimit,
allowlist: cfg.Allowlist,
bypassDenylist: cfg.BypassDenylist,
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewToolHandlerWithConfig does not enforce OutputLimit <= MaxOutputLimit. If cfg.OutputLimit is set higher than cfg.MaxOutputLimit, the default per-node limit (when output_limit attr is omitted) can exceed the intended hard ceiling. Consider clamping outputLimit to maxLimit (or returning an error) when building the handler config.

Copilot uses AI. Check for mistakes.
Comment on lines +158 to +174
@@ -72,6 +168,11 @@ func (h *ToolHandler) Execute(ctx context.Context, node *pipeline.Node, pctx *pi
command = fmt.Sprintf("cd %q && %s", cleaned, command)
}

// Layer 2: Denylist/allowlist check on the FINAL command string (after working_dir prepend).
if err := CheckToolCommand(command, node.ID, h.allowlist, h.bypassDenylist); err != nil {
return pipeline.Outcome{}, err
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CheckToolCommand is run after prepending the working_dir wrapper ("cd ... &&"). Because allowlist matching is per-statement, enabling an allowlist will now require patterns for the injected "cd" statement as well, which is surprising and can break previously-allowed commands. Consider validating allowlist/denylist against the user-authored command before adding the working_dir prefix (or explicitly exempt/allow the injected "cd" statement).

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +28
// defaultDenyPatterns are blocked in all tool_command executions.
// Cannot be overridden by .dip graph attrs. Only --bypass-denylist CLI flag disables them.
// Patterns use * as wildcard. Matching is case-insensitive, per-statement.
var defaultDenyPatterns = []string{
"eval *",
"exec *",
"source *",
". /*",
"curl * | *",
"wget * | *",
"* | sh",
"* | bash",
"* | zsh",
"* | /bin/sh",
"* | /bin/bash",
"* | sh -",
"* | bash -",
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The denylist patterns are brittle to whitespace variations and can be bypassed (e.g. "curl http://x|sh" won’t match "curl * | " / " | sh" due to missing spaces; "eval\tfoo" won’t match "eval "; and the current ". /" pattern blocks only absolute paths, not the common ". ./script.sh" form). Since this is a security layer, consider switching to token/regex-based matching that tolerates arbitrary whitespace (and explicitly covers ". *" / ". ./"), rather than simple space-sensitive globs.

Copilot uses AI. Check for mistakes.
CLAUDE.md Outdated

### Tool node safety — LLM output as shell input
- NEVER `eval` content extracted from LLM-written files (arbitrary command execution)
- Variable expansion in tool_command uses a safe-key allowlist: only `outcome`, `preferred_label`, `human_response`, `interview_answers`, and `graph.goal` can be interpolated. All LLM-origin keys (`last_response`, `tool_stdout`, `response.*`, etc.) are blocked.
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc line claims tool_command interpolation allows only specific ctx keys plus "graph.goal", but the implementation only restricts ctx.* and allows any ${graph.*} key in toolCommandMode. Either update the doc to reflect that all graph attrs are allowed, or tighten the implementation to match the documented contract.

Suggested change
- Variable expansion in tool_command uses a safe-key allowlist: only `outcome`, `preferred_label`, `human_response`, `interview_answers`, and `graph.goal` can be interpolated. All LLM-origin keys (`last_response`, `tool_stdout`, `response.*`, etc.) are blocked.
- Variable expansion in `tool_command` uses a safe-key allowlist for context values: only `outcome`, `preferred_label`, `human_response`, and `interview_answers` may be interpolated from `ctx.*`. Graph attribute interpolation is allowed via `${graph.*}` (including, but not limited to, `${graph.goal}`). LLM-origin context keys (`last_response`, `tool_stdout`, `response.*`, etc.) are blocked.

Copilot uses AI. Check for mistakes.
Comment on lines +357 to +360
func TestToolHandler_BlocksTaintedVariable(t *testing.T) {
env := exec.NewLocalEnvironment(t.TempDir())
h := NewToolHandler(env)
node := &pipeline.Node{
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses exec.NewLocalEnvironment directly, which requires a real "sh" in PATH. Elsewhere in this file toolTestEnv() is used to keep tests passing in sandboxed environments without sh; consider switching to toolTestEnv() (or skipping when sh is unavailable) for consistency and CI portability.

Copilot uses AI. Check for mistakes.
Comment on lines +376 to +379
func TestToolHandler_AllowsSafeVariable(t *testing.T) {
env := exec.NewLocalEnvironment(t.TempDir())
h := NewToolHandler(env)
node := &pipeline.Node{
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses exec.NewLocalEnvironment directly, which requires a real "sh" in PATH. For portability (matching the rest of this file), consider using toolTestEnv() or skipping when sh is unavailable.

Copilot uses AI. Check for mistakes.
Comment on lines +395 to +398
func TestToolHandler_DenylistBlocks(t *testing.T) {
env := exec.NewLocalEnvironment(t.TempDir())
h := NewToolHandler(env)
node := &pipeline.Node{
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses exec.NewLocalEnvironment directly, which requires a real "sh" in PATH. For portability (matching the rest of this file), consider using toolTestEnv() or skipping when sh is unavailable.

Copilot uses AI. Check for mistakes.
…imit, denylist, env, tests

- limitedBuffer.Write returns len(p) on truncation (not short write)
- TRACKER_PASS_ENV checks for "1" not just non-empty
- OutputLimit clamped to MaxOutputLimit in constructor
- Non-positive output_limit node attr rejected with error
- Denylist check moved before working_dir prepend (allowlist doesn't
  need cd patterns)
- Broadened denylist: `. ./*`, `* | sh *`, `* | bash *`, etc.
- CLAUDE.md: documents all graph.*/params.* allowed, not just graph.goal
- Tests use toolTestEnv() for CI portability

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pipeline/handlers/tool.go (1)

169-177: ⚠️ Potential issue | 🟠 Major

Constrain working_dir to the execution root.

filepath.Clean plus a raw .. substring check is not enough here. Absolute paths, working_dir="-", and symlinked directories all survive this guard, and Line 177 then prepends a shell cd that changes where an otherwise-validated command runs. Resolve the directory against the intended execution root, reject escapes after symlink resolution, and use cd -- for the final hop.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/handlers/tool.go` around lines 169 - 177, The working_dir validation
in the tool handler is insufficient: instead of only filepath.Clean + substring
checks, resolve the provided working_dir (node.Attrs["working_dir"]) against the
execution root, reject absolute paths or paths that escape the execution root
after evaluating symlinks (use filepath.Join(execRoot, wd) then
filepath.EvalSymlinks and ensure the result has execRoot as its prefix), treat
weird names like "-" as invalid, and if accepted build the shell change
directory with a safe form using "cd -- <cleanedRelPath> && <command>" (update
the code that produces command and the error returns of pipeline.Outcome{} to
use these stronger checks and explicit rejection messages).
♻️ Duplicate comments (1)
pipeline/handlers/tool.go (1)

205-212: ⚠️ Potential issue | 🟠 Major

Fail closed when limited execution is unavailable.

Line 211 still falls back to ExecCommand, which drops both the capped buffers and buildToolEnv() filtering for every non-local environment. That makes the new safety guarantees environment-dependent again. Prefer asserting on a small ExecCommandWithLimit interface and returning an error when the environment cannot provide it.

Possible fix
+type limitedExecEnvironment interface {
+	ExecCommandWithLimit(context.Context, string, []string, time.Duration, int, ...[]string) (exec.CommandResult, error)
+}
+
 var result exec.CommandResult
-if le, ok := h.env.(*exec.LocalEnvironment); ok {
+if le, ok := h.env.(limitedExecEnvironment); ok {
 	result, err = le.ExecCommandWithLimit(ctx, "sh", []string{"-c", command}, timeout, outputLimit, buildToolEnv())
 } else {
-	result, err = h.env.ExecCommand(ctx, "sh", []string{"-c", command}, timeout)
+	return pipeline.Outcome{}, fmt.Errorf(
+		"node %q execution environment %T does not support limited tool_command execution",
+		node.ID, h.env,
+	)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/handlers/tool.go` around lines 205 - 212, The current branch falls
back to h.env.ExecCommand, losing outputLimit and buildToolEnv protections;
instead define and assert a small interface (e.g., type limitedExecutor
interface { ExecCommandWithLimit(context.Context, string, []string,
time.Duration, int, map[string]string) (exec.CommandResult, error) }) and check
h.env against it (similar to the LocalEnvironment check); if the assertion fails
return an error explaining that limited execution is required, otherwise call
ExecCommandWithLimit with timeout, outputLimit and buildToolEnv(); update
references to exec.CommandResult, buildToolEnv, LocalEnvironment and h.env in
the implementation accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pipeline/handlers/tool.go`:
- Around line 169-177: The working_dir validation in the tool handler is
insufficient: instead of only filepath.Clean + substring checks, resolve the
provided working_dir (node.Attrs["working_dir"]) against the execution root,
reject absolute paths or paths that escape the execution root after evaluating
symlinks (use filepath.Join(execRoot, wd) then filepath.EvalSymlinks and ensure
the result has execRoot as its prefix), treat weird names like "-" as invalid,
and if accepted build the shell change directory with a safe form using "cd --
<cleanedRelPath> && <command>" (update the code that produces command and the
error returns of pipeline.Outcome{} to use these stronger checks and explicit
rejection messages).

---

Duplicate comments:
In `@pipeline/handlers/tool.go`:
- Around line 205-212: The current branch falls back to h.env.ExecCommand,
losing outputLimit and buildToolEnv protections; instead define and assert a
small interface (e.g., type limitedExecutor interface {
ExecCommandWithLimit(context.Context, string, []string, time.Duration, int,
map[string]string) (exec.CommandResult, error) }) and check h.env against it
(similar to the LocalEnvironment check); if the assertion fails return an error
explaining that limited execution is required, otherwise call
ExecCommandWithLimit with timeout, outputLimit and buildToolEnv(); update
references to exec.CommandResult, buildToolEnv, LocalEnvironment and h.env in
the implementation accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e268aeb0-db4e-43c6-af34-73ed3f30352b

📥 Commits

Reviewing files that changed from the base of the PR and between cff04ea and 7ba8bee.

📒 Files selected for processing (5)
  • CLAUDE.md
  • agent/exec/local.go
  • pipeline/handlers/tool.go
  • pipeline/handlers/tool_safety.go
  • pipeline/handlers/tool_test.go
✅ Files skipped from review due to trivial changes (1)
  • pipeline/handlers/tool_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • pipeline/handlers/tool_safety.go
  • agent/exec/local.go
  • CLAUDE.md

@clintecker clintecker merged commit 359a4f3 into main Apr 4, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

security(pipeline): tool_command has no sandboxing, allowlist, or resource limits

2 participants