fix(bash): treat linter/test-runner exit code 1 as non-error (#1436)#1438
Open
laurentftech wants to merge 3 commits into
Open
fix(bash): treat linter/test-runner exit code 1 as non-error (#1436)#1438laurentftech wants to merge 3 commits into
laurentftech wants to merge 3 commits into
Conversation
…#1436) Linters (ruff, eslint, flake8, biome), type checkers (mypy, pyright, tsc) and test runners (pytest, jest, vitest) use exit code 1 to mean "issues found" — not a crash. The default semantic treated any non-zero exit as an error, so the model was told isError=true and retried the same command instead of reading the lint output. - Add these tools to COMMAND_SEMANTICS: exit 1 informational, 2+ real error - Add pylint with its OR-ed bitfield (fatal/usage = error, findings = not) - Extend base-command extraction to see past package/module runners (uvx/npx/bunx/pipx, python -m) and resolve path/version-pinned invocations (./node_modules/.bin/eslint, eslint@8 → eslint) Fixes Gitlawb#1436 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
E2e testing against real binaries revealed tsc is inverted vs other linters: exit 1 = CLI/usage error, exit 2 = diagnostics found (verified TS 5.9). The initial exitOneInformational mapping was backwards — it would have flagged real type errors (exit 2) as failures and retried. - tsc now: 0=clean, 2=diagnostics (informational), 1 and 3+=real error - Add commandSemantics.e2e.test.ts: spawns real tsc and ruff (via uvx), feeds actual exit codes to interpretCommandResult. Skips gracefully when a binary is absent. Confirms ruff (1=violations) and uvx-prefix unwrap work against the real tool. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Author
Update: e2e testing caught a real bugAdded This revealed
The original e2e coverage (real
|
…ssage - exitOneInformational: keep "exit code N" message on the 2+ error branch (was dropping the context DEFAULT_SEMANTIC provides) - Package runners: skip value-flags and their argument so `npx -p typescript tsc` resolves to tsc, not the -p value - Constrain `-m` unwrap to words[1] so a `-m` appearing as a script arg (`python app.py -m foo`) no longer mis-resolves - Clarify header: per-command semantics are authoritative; note tsc inverts Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1436
Problem
Linters and test runners use exit code 1 to mean "issues found" — not a command crash.
commandSemantics.tsfalls back toDEFAULT_SEMANTICfor unknown commands, which treats any non-zero exit as an error.Result: the model is told
isError: true, the system prompt tells it to retry on tool errors, and it retries the sameruff/eslint/pytestcommand 2-3 times before giving up — even though the lint output it needs is right there in the result.Reported on v0.15.0 (Windows) with
uvx ruff check --fix:Fix
src/tools/BashTool/commandSemantics.ts:COMMAND_SEMANTICS— exit 1 = informational, 2+ = real error:ruff,eslint,flake8,biomemypy,pyright,tscpytest,jest,vitestpylint— uses an OR-ed bitfield (1=fatal, 2=error msg, 4=warn, 8=refactor, 16=convention, 32=usage). Only fatal (1) or usage error (32) is a real failure; message bits are findings the model should read.heuristicallyExtractBaseCommandnow looks past package/module runners and resolves path/version-pinned invocations:uvx ruff check→ruffnpx --yes eslint@8 src→eslintpython -m pytest→pytest./node_modules/.bin/tsc→tscDeliberately conservative
pnpm/yarn/poetryare not unwrapped: their subcommand is usually apackage.json/script name, not a tool, so exit-code semantics are ambiguous.black --check(exit 1 = would reformat) is not added — it falls through to default semantics. Can follow up if wanted.Test plan
bun test src/tools/BashTool/commandSemantics.test.ts— 58/58 passbun test src/tools/BashTool/— 126/126 pass (no regressions)tsc --noEmit— no new errors in changed fileNew tests cover each tool family, the pylint bitfield (including OR-ed codes like 17 and 20), runner unwrapping, version pins, path-based invocations, compound chains, and the negative case (
python script.pyexit 1 stays a real error).🤖 Generated with Claude Code