DRAFT: feat(terminal): reject Python/JSON literals passed as bash commands by juanmichelini · Pull Request #3582 · OpenHands/software-agent-sdk

juanmichelini · 2026-06-09T03:22:36Z

TL;DR

When the model packs structured data into the terminal tool's command field — e.g. [{'default': {...}}, ['apps'], ['code...']] — bash returns command not found and the model rarely self-corrects. This PR catches the malformation pre-execution and replies with an actionable hint instead.

Why

Trajectory analysis of a SWE-Bench Verified Nemotron 550B run (the one resolved instance in swebench/litellm_proxy-nemotron-3-ultra-550b-a55b/27176499467) showed:

148 actions to solve a <15 min difficulty Django bug (a competent agent does it in ~30–50).
18 of those 148 turns (12%) were terminal calls where command was a Python/JSON literal:
```
{"command": "[{'default': {'ENGINE': 'django.db.backends.sqlite3'}}, ['django.contrib.auth'], [\"field.column]\\n...\"]"}
```
Bash responded bash: [{default:: command not found. The model treated the error as a bug in its reproduction script (not its tool call) and rewrote the script 12 different times.
Each malformed turn cost $0.025–$0.04 directly, plus a cascading context-bloat tax (the failed call + error stay in conversation history forever, paid at every subsequent turn on non-caching models). Estimated total waste on this single trajectory: **$0.70**, or roughly 18% of the $3.97 spend.

The exact malformed indices in the run were 41, 50, 58, 59, 65–68, 76, 77, 87, 92, 95, 125, 135, 136, 141, 142.

What

1. Heuristic: `looks_like_python_literal_argument(command) -> str | None`

In openhands-tools/openhands/tools/terminal/definition.py, next to TerminalAction. Returns one of "list literal", "nested list literal", "dict literal", or None.

Detection rules — deliberately conservative, exempting all real bash uses:

Head	Verdict	Reason
`[{`	❌ list literal	Python/JSON list-of-dict
`["` `['`	❌ list literal	Python list of strings
`[[` followed by non-space	❌ nested list literal	Python `[[...]]`
`[[` (with space)	✅ allowed	bash extended-test `[[ -f x ]]`
`[` (with space)	✅ allowed	bash test `[ -f x ]`
`{"` `{'`	❌ dict literal	Python/JSON dict
`{` (with space)	✅ allowed	bash group `{ ls; }`
`echo '[1, 2]'`, `curl -d '{"a":1}'`	✅ allowed	command name is plain text

2. Pre-execution guard in `TerminalExecutor.call`

When the heuristic fires:

Return a TerminalObservation(is_error=True) whose text:
1. Names the literal kind and shows the offending head (first 60 chars).
2. Reminds the model terminal runs one shell command.
3. Teaches two concrete recovery paths: file_editor create → python /tmp/x.py, or an inline heredoc.
Emit one structured logger.warning(...) so eval pipelines can grep for the pattern (e.g. grep "Rejected terminal call" *.log | wc -l).
Skip the check when is_input=True (raw bytes forwarded to a running process, e.g. C-c keystrokes, where literals are legal payload).

3. Tests

tests/tools/terminal/test_literal_arg_guard.py — 24 tests:

Heuristic unit tests: parametric over real malformed samples from the Nemotron trajectory + synthetic variants; parametric negative cases for bash [, [[, {, echo, curl, edge-cases (empty / single char).
Executor integration: literal command returns the structured error and never reaches the shell; bash test expressions like [ -f /tmp/foo ] do reach the shell; is_input=True bypasses the guard; the WARN log fires with the literal kind.

All 24 pass. Existing 35 terminal parsing/observation tests pass unchanged.

Risk / blast radius

Purely behavior-changing for malformed input. A valid bash command starting with [ -f, [[ -z, or { cmd; } always has whitespace as the second char, so the heuristic exempts them by construction.
No schema change, no new dependency, no new event type. The error flows through the existing TerminalObservation(is_error=True) path that the framework already handles for tool-execution errors.
is_input=True carve-out preserves the keystroke-forwarding contract.
Worst case false positive: a user's legitimate command happens to start with [{ — vanishingly rare in shells, and the model can disambiguate by adding a leading space or quoting. The hint text explicitly tells them what to do.

How to verify locally

uv run pytest tests/tools/terminal/test_literal_arg_guard.py -q
# 24 passed

To see the pattern this addresses in a real trajectory, the per-turn malformed-call indices for the Nemotron 550B run linked above are [41, 50, 58, 59, 65, 66, 67, 68, 76, 77, 87, 92, 95, 125, 135, 136, 141, 142]. Each one returned bash: ...: command not found from a structured-literal command.

This PR was created by an AI agent (OpenHands) on behalf of the user investigating high per-instance cost in eval run swebench/litellm_proxy-nemotron-3-ultra-550b-a55b/27176499467.

@juanmichelini can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:c1799d2-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-c1799d2-python \
  ghcr.io/openhands/agent-server:c1799d2-python

All tags pushed for this build

ghcr.io/openhands/agent-server:c1799d2-golang-amd64
ghcr.io/openhands/agent-server:c1799d2dfff2e6861ab09b5394e2e7ca6d28b3ae-golang-amd64
ghcr.io/openhands/agent-server:openhands-sdk-5-bash-literal-arg-guard-golang-amd64
ghcr.io/openhands/agent-server:c1799d2-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:c1799d2-golang-arm64
ghcr.io/openhands/agent-server:c1799d2dfff2e6861ab09b5394e2e7ca6d28b3ae-golang-arm64
ghcr.io/openhands/agent-server:openhands-sdk-5-bash-literal-arg-guard-golang-arm64
ghcr.io/openhands/agent-server:c1799d2-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:c1799d2-java-amd64
ghcr.io/openhands/agent-server:c1799d2dfff2e6861ab09b5394e2e7ca6d28b3ae-java-amd64
ghcr.io/openhands/agent-server:openhands-sdk-5-bash-literal-arg-guard-java-amd64
ghcr.io/openhands/agent-server:c1799d2-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:c1799d2-java-arm64
ghcr.io/openhands/agent-server:c1799d2dfff2e6861ab09b5394e2e7ca6d28b3ae-java-arm64
ghcr.io/openhands/agent-server:openhands-sdk-5-bash-literal-arg-guard-java-arm64
ghcr.io/openhands/agent-server:c1799d2-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:c1799d2-python-amd64
ghcr.io/openhands/agent-server:c1799d2dfff2e6861ab09b5394e2e7ca6d28b3ae-python-amd64
ghcr.io/openhands/agent-server:openhands-sdk-5-bash-literal-arg-guard-python-amd64
ghcr.io/openhands/agent-server:c1799d2-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:c1799d2-python-arm64
ghcr.io/openhands/agent-server:c1799d2dfff2e6861ab09b5394e2e7ca6d28b3ae-python-arm64
ghcr.io/openhands/agent-server:openhands-sdk-5-bash-literal-arg-guard-python-arm64
ghcr.io/openhands/agent-server:c1799d2-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:c1799d2-golang
ghcr.io/openhands/agent-server:c1799d2dfff2e6861ab09b5394e2e7ca6d28b3ae-golang
ghcr.io/openhands/agent-server:openhands-sdk-5-bash-literal-arg-guard-golang
ghcr.io/openhands/agent-server:c1799d2-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:c1799d2-java
ghcr.io/openhands/agent-server:c1799d2dfff2e6861ab09b5394e2e7ca6d28b3ae-java
ghcr.io/openhands/agent-server:openhands-sdk-5-bash-literal-arg-guard-java
ghcr.io/openhands/agent-server:c1799d2-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:c1799d2-python
ghcr.io/openhands/agent-server:c1799d2dfff2e6861ab09b5394e2e7ca6d28b3ae-python
ghcr.io/openhands/agent-server:openhands-sdk-5-bash-literal-arg-guard-python
ghcr.io/openhands/agent-server:c1799d2-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., c1799d2-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., c1799d2-python-amd64) are also available if needed

When the model emits a tool call where `command` is a Python/JSON literal — e.g. `[{'default': {...}}, ['apps'], ['code...']]` — bash returns a useless "command not found" and the model rarely self-corrects. Trajectory analysis of a SWE-Bench Nemotron run showed ~12% of terminal calls (18 of 148) burning context this way, with cascading cost growth because failed turns stay in history. This PR adds a pre-execution guard in `TerminalExecutor` that: * Detects the literal head pattern (`[{`, `[[`, `["`, `['`, `{"`, `{'`) while carefully exempting real bash uses of `[`, `[[`, and `{` (which are always followed by whitespace). * Returns a `TerminalObservation(is_error=True)` whose text names the literal kind, shows the offending head, and points the model at the two recovery patterns: `file_editor create` + `python /tmp/x.py`, or an inline heredoc. * Emits one structured WARN log per rejection so eval pipelines can grep for the pattern. * Skips the check when `is_input=True` (raw bytes forwarded to a running process, e.g. `C-c` keystrokes). The detection lives in `definition.py` next to `TerminalAction` as `looks_like_python_literal_argument(command) -> str | None`, exported for reuse and individually unit-tested. Tests: `tests/tools/terminal/test_literal_arg_guard.py` — 24 new tests covering the heuristic matrix (real malformed samples, synthetic variants, bash false-positives like `[ -f x ]` and `{ ls; }`), the executor returning the structured error without hitting the shell, `is_input=True` bypass, and WARN log emission. Existing terminal parsing/observation suite (35 tests) passes unchanged. Lint+pyright clean. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-06-09T03:23:08Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-09T03:23:11Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-09T03:29:36Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-tools/openhands/tools/terminal
definition.py	126	59	53%	68, 72, 76–78, 82, 119, 122, 125–126, 128, 131–133, 135–137, 139–141, 143, 171, 179, 206, 208–210, 213, 215, 217–219, 221, 225–226, 229–231, 233–234, 237–240, 244–246, 251, 255–260, 262–263, 265, 280, 314
impl.py	245	104	57%	112, 119–120, 148–152, 155–157, 159–163, 170, 189–190, 195–196, 225–228, 230, 248–257, 259, 264, 270–271, 294, 297, 300, 304, 322, 325, 333–334, 343, 376–377, 379, 387, 391–394, 396–397, 404, 406, 410, 434–435, 437–439, 444–445, 447–448, 450, 459, 461–462, 464, 484–487, 492–493, 515, 525–528, 532–533, 541, 552–553, 560, 573, 582–589, 597–598
TOTAL	30147	15372	49%

all-hands-bot

✅ QA Report: PASS

The terminal tool now rejects top-level Python/JSON literal command values with an actionable tool-argument error while preserving normal shell usage.

Does this PR achieve its stated goal?

Yes. I reproduced the old behavior on origin/main: the same malformed literal was sent to bash and returned [{default:: command not found with exit code 127. On PR commit 56588a8f, the same real TerminalExecutor call returned is_error=True, exit_code=None, and a structured [Tool-argument error] hint instead; bash test expressions, bash groups, heredocs, JSON-shaped command arguments, and is_input=True payloads still executed/bypassed as expected.

Phase	Result
Environment Setup	✅ `uv run` created/synced `.venv` and built the local packages needed to import and execute the terminal tool.
CI Status	⚠️ Not fully green: `Validate PR description` is failing and `qa-changes` is in progress; most other visible checks are passing or skipped.
Functional Verification	✅ Real terminal tool execution showed the malformed-command fix works and key allowed shell/input paths still work.

Functional Verification

Test 1: Malformed Python/JSON literal is rejected before bash

Step 1 — Reproduce / establish baseline without the fix:
Checked out origin/main (50d73244) and ran the actual terminal tool path:

OPENHANDS_SUPPRESS_BANNER=1 uv run python - <<'PY'
from pathlib import Path
from openhands.tools.terminal.definition import TerminalAction
from openhands.tools.terminal.impl import TerminalExecutor

executor = TerminalExecutor(working_dir=Path.cwd())
obs = executor(TerminalAction(command="[{'default': {'ENGINE': 'sqlite3'}}]"))
print('is_error=', obs.is_error)
print('exit_code=', obs.exit_code)
print('text=')
print(obs.text.strip())
PY

Observed:

is_error= False
exit_code= 127
text=
[{default:: command not found

This confirms the bug exists on main: the malformed structured literal reaches bash and produces a generic command-not-found error.

Step 2 — Apply the PR's changes:
Checked out PR commit 56588a8f672328d658bf7114b1b510169a7b8d6e.

Step 3 — Re-run with the fix in place:
Ran the same terminal tool call on the PR commit. Observed:

WARNING ... Rejected terminal call: command argument looks like a Python/JSON list literal ...
is_error= True
exit_code= None
text=
[Tool-argument error] The `command` argument looks like a Python/JSON list literal, not a shell command. It starts with: "[{'default': {'ENGINE': 'sqlite3'}}]"

The `terminal` tool runs exactly ONE shell command at a time. To pass structured data or multi-line code:
  - Write a script first with `file_editor` (command="create", path="/tmp/run.py", ...), then invoke it: `python /tmp/run.py`.
  - Or use a heredoc inline, e.g.:
        python - <<'EOF'
        DATABASES = {'default': {...}}
        # your code here
        EOF

This shows the PR achieves the main goal: the literal is caught pre-execution, returns an actionable hint, and emits the promised warning.

Test 2: Legitimate shell commands are not blocked

On both origin/main and the PR commit, I ran the actual terminal tool with commands that should remain valid:

[ -f pyproject.toml ] && echo FOUND
python - <<'EOF'
print({'default': {'ENGINE': 'sqlite3'}}['default']['ENGINE'])
EOF

On the PR commit, the outputs were:

=== bash_test ===
is_error= False
exit_code= 0
text=
FOUND
=== json_argument ===
is_error= False
exit_code= 0
text=
sqlite3

I also sampled additional claimed cases on the PR commit:

=== dict_literal ===
is_error= True
exit_code= None
text=
[Tool-argument error] The `command` argument looks like a Python/JSON dict literal, not a shell command. It starts with: '{"key": "value"}'

=== nested_list_literal ===
is_error= True
exit_code= None
text=
[Tool-argument error] The `command` argument looks like a Python/JSON nested list literal, not a shell command. It starts with: "[['col1', 'col2']]"

=== bash_double_bracket ===
is_error= False
exit_code= 0
text=
DOUBLE_OK
=== bash_group ===
is_error= False
exit_code= 0
text=
GROUP_OK

This confirms the guard catches representative list/dict/nested-list literals while allowing common bash [ ... ], [[ ... ]], { ...; }, heredoc, and JSON-as-argument patterns.

Test 3: `is_input=True` bypass still allows literal payloads

I started a real subprocess terminal cat process, sent a Python-literal-looking payload as interactive input, then interrupted it:

=== start_cat ===
is_error= False
exit_code= -1
text=

=== is_input_literal_payload ===
is_error= False
exit_code= -1
text=
[{'sent_as_keystrokes': True}]
=== stop_cat ===
is_error= False
exit_code= 130
text=
^C

This shows literal-looking text sent as keystrokes was forwarded to the running process instead of being rejected by the new guard.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

juanmichelini · 2026-06-09T21:49:58Z

This was tested here https://openhands-eval-monitor.vercel.app/?days=15&run=swebench%2Flitellm_proxy-nemotron-3-ultra-550b-a55b%2F27224350936%2F

It does improve acceptance rate to 100%

all-hands-bot · 2026-06-09T21:50:58Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

Code Review: `feat(terminal): reject Python/JSON literals passed as bash commands`

Taste Rating: 🟡 Acceptable — Works, solves a real problem, but has rough edges worth polishing.

[CRITICAL ISSUES]

None found. No breaking changes, no security risks.

[IMPROVEMENT OPPORTUNITIES]

[openhands-tools/openhands/tools/terminal/definition.py, Lines 39-55] Verbose template string: The _LITERAL_ARG_HINT_TEMPLATE uses {{ escaping for f-strings, making it harder to read. Consider using textwrap.dedent or moving the template to a separate constant without f-string interpolation (interpolate only at call time).
[openhands-tools/openhands/tools/terminal/impl.py, Lines 546-569] Guard block indentation: The guard logic is nested under if not action.is_input: which is itself inside __call__. Consider extracting to a small helper function for clarity to reduce nesting levels.
[tests/tools/terminal/test_literal_arg_guard.py] Consider adding explicit edge case: A command starting with [[ followed by whitespace (valid bash) should be tested explicitly rather than relying on the absence of that pattern.

[TESTING GAPS]

The test suite is solid. The sentinel-based executor_without_shell fixture is an excellent pattern — it proves the guard intercepts before shell execution. No gaps found.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟢 LOW

This is a defensive guard that only fires when the model produces malformed tool calls. It never blocks legitimate commands (confirmed by comprehensive negative test cases). The is_input=True bypass is correctly implemented. No user-facing API changes. Tests provide high confidence.

VERDICT:
✅ Worth merging — The core logic is sound and solves a real problem observed in production eval runs. Polish the template readability and consider the helper extraction for long-term maintainability.

KEY INSIGHT:
The sentinel-based testing approach (stubbing shell paths to raise) is the right pattern for this — it proves the guard runs before shell execution without needing a real subprocess.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

all-hands-bot

⚠️ QA Report: PASS WITH ISSUES

I functionally verified the terminal executor now rejects Python/JSON literal commands with actionable hints while preserving ordinary shell commands and interactive input.

Does this PR achieve its stated goal?

Yes. On origin/main, sending "[{'default': {'ENGINE': 'sqlite3'}}]" through the real TerminalExecutor reached bash and returned [{default:: command not found with exit code 127. On PR commit c1799d2dfff2e6861ab09b5394e2e7ca6d28b3ae, the same executor call returned a structured [Tool-argument error] mentioning list literal, included recovery guidance, set exit_code: null, and emitted the expected Rejected terminal call warning. I also verified legitimate [ ... ], [[ ... ]], JSON-as-argument commands, and is_input=True interactive input continue to work.

Phase	Result
Environment Setup	✅ `make build` completed successfully; no tests/linters were run locally.
CI Status	⚠️ `REST API (OpenAPI)` was failing and several checks were still in progress when checked; many core checks were green.
Functional Verification	✅ Real `TerminalExecutor` usage showed the fix works and did not block tested valid shell/input cases.

Functional Verification

Test 1: Malformed Python list/dict literal is rejected before bash

Step 1 — Reproduce / establish baseline without the fix:
Checked out origin/main (16ad9e13) and ran the real terminal executor with a malformed command.

Relevant output:

{"case": "literal_list_of_dict", "command": "[{'default': {'ENGINE': 'sqlite3'}}]", "is_error": false, "exit_code": 127, "text": "\u001b[?2004l[{default:: command not found\n\u001b[?2004h"}

This confirms the old behavior: the malformed structured literal reached bash and produced the confusing command not found error.

Step 2 — Apply the PR's changes:
Checked out openhands/sdk-5-bash-literal-arg-guard at c1799d2dfff2e6861ab09b5394e2e7ca6d28b3ae.

Step 3 — Re-run with the fix in place:
Ran the same executor scenario on the PR branch.

Relevant output:

WARNING ... Rejected terminal call: command argument looks like a Python/JSON list literal ... Returning structured hint to the model instead of executing.

{"case": "literal_list_of_dict", "command": "[{'default': {'ENGINE': 'sqlite3'}}]", "is_error": true, "exit_code": null, "text": "[Tool-argument error] The `command` argument looks like a Python/JSON list literal, not a shell command. It starts with: "[{'default': {'ENGINE': 'sqlite3'}}]" ... Write a script first with `file_editor` ... Or use a heredoc inline ..."}

This shows the PR achieves the core goal: the literal is rejected pre-execution with actionable guidance instead of being passed to bash.

Test 2: Legitimate shell syntax and JSON-shaped arguments still run

Step 1 — Baseline without the fix:
On origin/main, the same executor accepted a bash test expression and a JSON-shaped argument:

{"case": "bash_test_expression", "command": "[ -f /tmp/oh_qa_missing_file ]; echo status:$?", "is_error": false, "exit_code": 0, "text": "\u001b[?2004lstatus:1\n\u001b[?2004h"}
{"case": "echo_json_argument", "command": "printf '%s\\n' '{"a": 1}'", "is_error": false, "exit_code": 0, "text": "\u001b[?2004l{"a": 1}\n\u001b[?2004h"}

Step 2 — Apply the PR's changes:
Returned to the PR branch.

Step 3 — Re-run with the fix in place:
The same commands still executed normally:

{"case": "bash_test_expression", "command": "[ -f /tmp/oh_qa_missing_file ]; echo status:$?", "is_error": false, "exit_code": 0, "text": "\u001b[?2004lstatus:1\n\u001b[?2004h"}
{"case": "echo_json_argument", "command": "printf '%s\\n' '{"a": 1}'", "is_error": false, "exit_code": 0, "text": "\u001b[?2004l{"a": 1}\n\u001b[?2004h"}
{"case": "bash_double_bracket", "is_input": false, "is_error": false, "exit_code": 0, "text": "\u001b[?2004lstatus:0\n\u001b[?2004h"}

This shows the guard did not block the tested valid shell forms or commands with JSON-looking arguments.

Test 3: Dict/nested-list literals and interactive input behavior

Step 1 — Baseline without the fix:
Before the PR, top-level structured literals were sent to the shell, as shown by Test 1's command not found result. is_input=True had no literal guard to trip.

Step 2 — Apply the PR's changes:
Ran additional real executor scenarios on the PR branch.

Step 3 — Re-run with the fix in place:
Dict and nested-list literals were rejected with their specific kinds:

{"case": "json_dict_literal", "is_input": false, "is_error": true, "exit_code": null, "text": "[Tool-argument error] The `command` argument looks like a Python/JSON dict literal, not a shell command..."}
{"case": "nested_list_literal", "is_input": false, "is_error": true, "exit_code": null, "text": "[Tool-argument error] The `command` argument looks like a Python/JSON nested list literal, not a shell command..."}

For is_input=True, I started an interactive read command and then sent a literal-looking payload as input:

{"case": "start_read", "is_error": false, "exit_code": -1, "text": "\u001b[?2004l"}
{"case": "send_literal_input", "is_error": false, "exit_code": 0, "text": "captured:[{'sent_as_keystrokes': True}]\n\u001b[?2004h"}

This confirms the guard does not reject literal-looking bytes when they are interactive input to a running process.

Issues Found

🟠 CI status: REST API (OpenAPI) was failing and several checks were still in progress when checked. I did not find a functional terminal-behavior issue, but the PR is not fully green yet.

This review was created by an AI agent (OpenHands) on behalf of the user.

Final verdict: PASS WITH ISSUES

juanmichelini added the enhancement New feature or request label Jun 9, 2026 — with OpenHands AI

juanmichelini marked this pull request as ready for review June 9, 2026 21:28

all-hands-bot reviewed Jun 9, 2026

View reviewed changes

Merge branch 'main' into openhands/sdk-5-bash-literal-arg-guard

c1799d2

juanmichelini requested a review from all-hands-bot June 9, 2026 21:50

all-hands-bot reviewed Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAFT: feat(terminal): reject Python/JSON literals passed as bash commands#3582

DRAFT: feat(terminal): reject Python/JSON literals passed as bash commands#3582
juanmichelini wants to merge 2 commits into
mainfrom
openhands/sdk-5-bash-literal-arg-guard

juanmichelini commented Jun 9, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

juanmichelini commented Jun 9, 2026

Uh oh!

all-hands-bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

juanmichelini commented Jun 9, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Why

What

1. Heuristic: looks_like_python_literal_argument(command) -> str | None

2. Pre-execution guard in TerminalExecutor.__call__

3. Tests

Risk / blast radius

How to verify locally

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: Malformed Python/JSON literal is rejected before bash

Test 2: Legitimate shell commands are not blocked

Test 3: is_input=True bypass still allows literal payloads

Issues Found

Uh oh!

juanmichelini commented Jun 9, 2026

Uh oh!

all-hands-bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Code Review: feat(terminal): reject Python/JSON literals passed as bash commands

[CRITICAL ISSUES]

[IMPROVEMENT OPPORTUNITIES]

[TESTING GAPS]

[RISK ASSESSMENT]

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

⚠️ QA Report: PASS WITH ISSUES

Does this PR achieve its stated goal?

Test 1: Malformed Python list/dict literal is rejected before bash

Test 2: Legitimate shell syntax and JSON-shaped arguments still run

Test 3: Dict/nested-list literals and interactive input behavior

Issues Found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

juanmichelini commented Jun 9, 2026 •

edited by github-actions Bot

Loading

1. Heuristic: `looks_like_python_literal_argument(command) -> str | None`

2. Pre-execution guard in `TerminalExecutor.call`

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Test 3: `is_input=True` bypass still allows literal payloads

all-hands-bot commented Jun 9, 2026 •

edited

Loading

Code Review: `feat(terminal): reject Python/JSON literals passed as bash commands`