Skip to content

fix(bash): treat linter/test-runner exit code 1 as non-error (#1436)#1438

Open
laurentftech wants to merge 3 commits into
Gitlawb:mainfrom
laurentftech:fix/linter-exit-code-semantics
Open

fix(bash): treat linter/test-runner exit code 1 as non-error (#1436)#1438
laurentftech wants to merge 3 commits into
Gitlawb:mainfrom
laurentftech:fix/linter-exit-code-semantics

Conversation

@laurentftech
Copy link
Copy Markdown

Closes #1436

Problem

Linters and test runners use exit code 1 to mean "issues found" — not a command crash. commandSemantics.ts falls back to DEFAULT_SEMANTIC for unknown commands, which treats any non-zero exit as an error.

Result: the model is told isError: true, the system prompt tells it to retry on tool errors, and it retries the same ruff/eslint/pytest command 2-3 times before giving up — even though the lint output it needs is right there in the result.

Reported on v0.15.0 (Windows) with uvx ruff check --fix:

Bash(uvx ruff check --fix app/foo.py 2>&1)
  └  Error: Exit code 1
E501 Line too long (105 > 100)

Fix

src/tools/BashTool/commandSemantics.ts:

  1. Add tools to COMMAND_SEMANTICS — exit 1 = informational, 2+ = real error:
    • Linters: ruff, eslint, flake8, biome
    • Type checkers: mypy, pyright, tsc
    • Test runners: pytest, jest, vitest
  2. pylint — uses an OR-ed bitfield (1=fatal, 2=error msg, 4=warn, 8=refactor, 16=convention, 32=usage). Only fatal (1) or usage error (32) is a real failure; message bits are findings the model should read.
  3. Runner-aware base command extractionheuristicallyExtractBaseCommand now looks past package/module runners and resolves path/version-pinned invocations:
    • uvx ruff checkruff
    • npx --yes eslint@8 srceslint
    • python -m pytestpytest
    • ./node_modules/.bin/tsctsc

Deliberately conservative

  • pnpm/yarn/poetry are not unwrapped: their subcommand is usually a package.json/script name, not a tool, so exit-code semantics are ambiguous.
  • black --check (exit 1 = would reformat) is not added — it falls through to default semantics. Can follow up if wanted.
  • Anything not in the list keeps today's exact behavior.

Test plan

  • bun test src/tools/BashTool/commandSemantics.test.ts — 58/58 pass
  • bun test src/tools/BashTool/ — 126/126 pass (no regressions)
  • tsc --noEmit — no new errors in changed file

New tests cover each tool family, the pylint bitfield (including OR-ed codes like 17 and 20), runner unwrapping, version pins, path-based invocations, compound chains, and the negative case (python script.py exit 1 stays a real error).

🤖 Generated with Claude Code

laurentftech and others added 2 commits May 29, 2026 22:25
…#1436)

Linters (ruff, eslint, flake8, biome), type checkers (mypy, pyright, tsc)
and test runners (pytest, jest, vitest) use exit code 1 to mean "issues
found" — not a crash. The default semantic treated any non-zero exit as an
error, so the model was told isError=true and retried the same command
instead of reading the lint output.

- Add these tools to COMMAND_SEMANTICS: exit 1 informational, 2+ real error
- Add pylint with its OR-ed bitfield (fatal/usage = error, findings = not)
- Extend base-command extraction to see past package/module runners
  (uvx/npx/bunx/pipx, python -m) and resolve path/version-pinned invocations
  (./node_modules/.bin/eslint, eslint@8 → eslint)

Fixes Gitlawb#1436

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
E2e testing against real binaries revealed tsc is inverted vs other
linters: exit 1 = CLI/usage error, exit 2 = diagnostics found (verified
TS 5.9). The initial exitOneInformational mapping was backwards — it would
have flagged real type errors (exit 2) as failures and retried.

- tsc now: 0=clean, 2=diagnostics (informational), 1 and 3+=real error
- Add commandSemantics.e2e.test.ts: spawns real tsc and ruff (via uvx),
  feeds actual exit codes to interpretCommandResult. Skips gracefully when
  a binary is absent. Confirms ruff (1=violations) and uvx-prefix unwrap
  work against the real tool.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@laurentftech
Copy link
Copy Markdown
Author

Update: e2e testing caught a real bug

Added commandSemantics.e2e.test.ts that spawns real binaries and feeds their actual exit codes to interpretCommandResult (skips gracefully when a binary is absent).

This revealed tsc is inverted vs every other linter (verified against TypeScript 5.9):

Exit tsc meaning Classification
0 clean not error
1 CLI/usage error real error
2 type/syntax diagnostics found not error (read them)
3+ config/internal error real error

The original exitOneInformational('tsc') mapping was backwards — it would have flagged real type errors (exit 2) as failures and retried, the exact bug this PR fixes. Now corrected with a dedicated semantic.

e2e coverage (real tsc, and real ruff via uvx):

  • tsc clean→0, type-error→2 (non-error), bad-flag→1 (error)
  • ruff clean→0, violations→1 (non-error, uvx prefix unwraps to ruff), bad-flag→2 (error)

bun test src/tools/BashTool/ → 135 pass.

…ssage

- exitOneInformational: keep "exit code N" message on the 2+ error branch
  (was dropping the context DEFAULT_SEMANTIC provides)
- Package runners: skip value-flags and their argument so
  `npx -p typescript tsc` resolves to tsc, not the -p value
- Constrain `-m` unwrap to words[1] so a `-m` appearing as a script arg
  (`python app.py -m foo`) no longer mis-resolves
- Clarify header: per-command semantics are authoritative; note tsc inverts

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: treat linter exit code 1 as non-error in commandSemantics (ruff, eslint, uvx, npx)

1 participant