Skip to content

Fix full-codebase review findings: hermetic tests, py.typed, CLI error handling, installer hardening, CI#129

Merged
dhruvbatra merged 4 commits into
mainfrom
fix/codebase-review-findings
Jun 10, 2026
Merged

Fix full-codebase review findings: hermetic tests, py.typed, CLI error handling, installer hardening, CI#129
dhruvbatra merged 4 commits into
mainfrom
fix/codebase-review-findings

Conversation

@dhruvbatra

@dhruvbatra dhruvbatra commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Fixes every issue from a full-codebase review (Claude Code review with four area subagents, cross-reviewed by Codex — including the two issues Codex found that the original review missed, and one issue Codex found in this PR's own escaping path).

Highest impact

  • Unit tests executed real `npx` installs against `$HOME` — four __install_flow tests ran unmocked npx add-mcp -y -g ... / npx skills add ... -g, rewriting ~/.cursor/mcp.json, ~/.codex/config.toml, ~/.gemini/settings.json on any machine (or CI runner) with npx. Now stubbed like their sibling tests; full suite runs in ~4s with zero $HOME writes.
  • py.typed was missing despite the Typing :: Typed classifier — type checkers ignored every annotation in the installed package. Added and asserted in the wheel-content test.
  • Committed install.sh was stale (0.7.7 header at 0.7.8) — and yutori.com/install.sh serves this file, so it was live. Regenerated; the pre-commit freshness hook now also triggers on pyproject.toml, and new CI enforces it.
  • No CI existed; publishing was ungated — added a CI workflow (ruff, pytest on 3.9–3.13, wheel-content test, twine check, installer freshness + bash -n); the publish workflow now runs tests and twine check before upload, and the release-asset action is pinned to a commit SHA.

Behavior changes

  • yutori browse run / research run / scouts create now exit 1 when the API reports status: failed (previously exit 0 — scripts treated rejected tasks as success). Documented in api.md.
  • CLI data commands print one-line errors (exit 1) for AuthenticationError / APIError / network failures instead of multi-screen tracebacks; rejected keys get "Run 'yutori auth login'" guidance. AUTH_FAILURE_MARKERS updated in lockstep.
  • 401/403 SDK errors include the status code and server response body; 3xx responses raise APIError instead of silently returning {}.
  • API keys are stripped on resolution — a trailing-newline key (e.g. export YUTORI_API_KEY=$(cat key)) no longer fails every request deep in httpx.

Correctness fixes

  • Rich markup injection: scout queries like watch [/b] releases crashed scouts list with MarkupError; [beta]-style tokens were silently deleted; the installer summary could crash after a successful install. All user/API/subprocess text rendered through Rich is now escaped (and stringified first — non-string API values also crashed).
  • yutori.n1 compat shims: yutori.n1.payload attribute access and from yutori.n1.page_ready import SupportsAsyncPageReady work again (star-import shims dropped submodule attributes and non-__all__ names).
  • Payload trimming converges: a message holding several screenshots could stay over max_bytes no matter how many passes ran; trimming now drains multi-image messages (and extra images sharing the last message).
  • Replay robustness: valid-JSON non-object tool arguments no longer crash HTML generation; screenshot URLs are HTML-escaped; a text-only user interjection no longer blanks the pending tool screenshot.
  • apply_chat_extra_body no longer mutates the caller's extra_body dict; install-flow Ctrl+C during verification polling renders the summary and honors the documented exit-code contract; Task ID: N/A is treated as missing instead of polling browse get N/A; Windows venvs resolve Scripts\python.exe; EADDRINUSE detected by errno; failed save_config after key generation reports the orphaned key.

Packaging / installer

  • setuptools>=77 build floor (PEP 639 string license fails validation on 67–76).
  • MANIFEST.in grafts tests/ so the sdist suite can collect (it shipped test_*.py without conftest.py/helpers).
  • Installer frame-render dir uses mktemp -d (predictable /tmp path was symlink-clobberable on shared Linux); uninstall.sh accepts yes, not just y.
  • __version__ fallback is 0.0.0+unknown instead of a plausible-but-wrong 0.1.0.

Cleanup

Internal tracker/hostname references removed from comments; stale comments about a removed stdout guard (and their no-op isatty patches) removed; examples use the canonical NAVIGATOR_N1_MODEL; two test files gained __future__ imports their PEP 604 annotations needed on 3.9; the packaging test builds from a temp copy instead of deleting the repo's build/.

Verification

  • 421 passed, 3 skipped (shellcheck not installed locally) — includes 17 new regression tests pinning these fixes
  • ruff check . clean; bash -n on all installer scripts
  • Wheel inspected: yutori/py.typed + all 8 JS assets present; sdist inspected: full test suite included; twine check PASSED on both
  • install.sh regeneration verified idempotent; pinned action SHA verified upstream

Not included (needs external setup): OIDC trusted publishing requires configuring the publisher on PyPI first; the workflow keeps the token until then.

🤖 Generated with Claude Code


Note

Medium Risk
CLI exit-code changes can break scripts that assumed success on rejected creates; auth and HTTP error handling changes affect all API clients, while CI/publish gates reduce the risk of shipping broken releases.

Overview
This PR bundles review-driven fixes across CI/publish, CLI/SDK behavior, packaging, and shell installers, bumping to 0.7.8 with a regenerated install.sh.

Automation: New CI runs ruff, pytest on Python 3.9–3.13, slow wheel/sdist checks, twine check, and install.sh freshness + bash -n. Publish now runs tests and metadata validation before upload; the GitHub release action is pinned to a commit SHA. Pre-commit also retriggers install.sh regen when pyproject.toml version changes.

CLI (breaking for scripts): browse run, research run, and scouts create exit 1 on API status: failed or missing task ID. Commands use shared cli_client / safe_str so API, auth, and network errors print short messages (no tracebacks) and user/API strings don’t break Rich markup.

SDK/auth: Keys are stripped at resolve time; 401/403 include status and body; 3xx and 2xx non-JSON raise APIError instead of {}. OAuth bind errors use errno; failed save_config after key creation reports an orphaned key. Auth API base URL respects YUTORI_API_BASE_URL.

Navigator / compat: Payload trimming drains multi-image messages; replay HTML escapes screenshot URLs and keeps tool screenshots when a text-only user message follows. yutori.n1 shims use install_shim / lazy submodules so submodule attributes match pre-rename imports; examples use NAVIGATOR_N1_MODEL.

Packaging/installer: Ships py.typed, MANIFEST.in grafts full tests/, setuptools>=77. Installer frame cache uses mktemp -d; uninstall.sh accepts yes. Install-flow tests stub MCP npx installs; verification handles Ctrl+C, Task ID: N/A, and Windows venv Scripts\python.exe.

Reviewed by Cursor Bugbot for commit 0b1e0ba. Bugbot is set up for automated code reviews on this repo. Configure here.

dhruvbatra and others added 4 commits June 10, 2026 12:15
Tests / safety:
- Stub MCP/skills installs in the four install-flow tests that executed
  real `npx add-mcp` / `npx skills add` against $HOME
- Make the wheel-content test build from a temp copy instead of deleting
  the repo's build/ dir; stringify API fields before Rich-escaping in
  `scouts list` (non-string values no longer crash the CLI)

SDK:
- handle_response: treat 3xx as an error (redirects are not followed);
  include status + server body in 401/403 AuthenticationError messages
- apply_chat_extra_body: stop mutating the caller's extra_body dict
- Strip resolved API keys (trailing-newline keys no longer fail deep in
  httpx with LocalProtocolError)
- Auth flow: detect EADDRINUSE by errno; report the orphaned key when
  save_config fails after key generation; YUTORI_API_BASE_URL override
- __version__ fallback 0.1.0 -> 0.0.0+unknown

yutori.n1 compat:
- Shims now alias full module contents (non-__all__ names importable
  again) and the package exposes submodules as attributes

Navigator:
- Payload trimming converges on multi-image messages
- Replay: survive valid-JSON non-object tool arguments, escape
  screenshot URLs in generated HTML, keep the pending tool screenshot
  across text-only user interjections
- _assets: chained joinpath for py3.9 zip-import compatibility

CLI:
- Friendly errors (exit 1, no traceback) for AuthenticationError /
  APIError / httpx errors across data commands, with AUTH_FAILURE_MARKERS
  kept in sync
- browse/research run and scouts create exit 1 when the API reports
  status=failed
- Escape user/API/subprocess text rendered through Rich markup
- Drop the dead auth-URL reprint after login failure; treat "N/A" task id
  as missing in verification; Windows venv Scripts/ path; Ctrl+C during
  verification polling now renders the summary and honors the exit-code
  contract

Packaging / installer / CI:
- Ship yutori/py.typed (Typing :: Typed was previously a no-op for
  consumers); setuptools>=77 floor to match the PEP 639 license syntax;
  MANIFEST.in grafts tests/ so the sdist suite can collect
- install.sh regenerated (0.7.8 header); frame render dir uses mktemp -d
  (predictable /tmp path TOCTOU); uninstall.sh accepts "yes"; pre-commit
  freshness hook now also triggers on pyproject.toml
- New CI workflow (ruff + pytest on 3.9-3.13 + wheel content test +
  twine check + install.sh freshness); publish workflow gates on tests
  and twine check; release action pinned to a commit SHA
- Add __future__ imports to two test files whose PEP 604 annotations
  would crash collection on Python 3.9

Cleanup: internal tracker/hostname references removed from comments,
stale redirected-stdout-guard comments and no-op isatty patches removed,
examples use NAVIGATOR_N1_MODEL, api.md documents the new exit codes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
SC2016 is info-level and flags the intentional literal backticks in
uninstall.sh's single-quoted grep pattern and user-facing message. The
test always skipped locally (no shellcheck) and only started running on
the new CI's ubuntu runners, which ship shellcheck preinstalled.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Error handling (silent-failure review):
- handle_response: a 2xx non-JSON body (captive portal / SSO gateway)
  now raises APIError instead of escaping as a bare JSONDecodeError
  that bypassed cli_api_errors; 3xx errors name the redirect Location;
  401/403 body detail truncated to 500 chars
- print_task_submission_result: a non-failed create response without a
  task_id now fails (red banner, exit 1) instead of printing a green
  "submitted" banner for an unpollable task; the installer's "N/A"
  guard becomes defense-in-depth
- cli_api_errors: auth recovery hint tailored to the rejected key's
  source (env var vs saved config vs none)
- run_login_flow: non-EADDRINUSE bind errors name the callback server
  host:port; orphaned-key message points at the settings page
- yutori.n1: PEP 562 __dir__ so lazy submodules stay discoverable

Comment accuracy (comment review):
- chat namespace docstrings no longer suggest passing max_retries
  through create() (not accepted there); retry policy documented as
  not configurable via the SDK surface
- credentials: trailing-newline example replaced ($(cat file) strips
  trailing newlines and cannot reproduce the failure)
- n1 __init__ comment no longer overstates pre-rename eager imports or
  hardcodes the shim count; payload trimming comments/docstring state
  the per-phase single-strip constraint and best-effort protection
- release action pin annotated with machine-readable trailing # v2.6.2

Test gaps (test-coverage review):
- test_port_in_use now constructs OSError with errno so the
  EADDRINUSE branch is actually exercised; companion fallthrough test
- scouts create failed-status exit code, missing-task-id failure,
  installer prompt/summary markup escaping, Windows Scripts venv
  branch (via an os proxy), YUTORI_API_BASE_URL override (module
  reload), new "invalid api key" auth marker, full n1 submodule set +
  dir(), positive escaped-URL assertion in the replay XSS test,
  mktemp -d and yes-regex pins for the shell scripts, and sdist
  content assertions (conftest/helpers/py.typed) in the packaging test

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Quality pass over the review-fix PR (reuse / simplification /
efficiency / altitude), no behavior changes:

- CLI: escape-by-default in the shared print helpers (safe_str) —
  print_optional_field/print_aligned_fields/print_creation_result now
  escape values unconditionally, closing the six remaining unescaped
  API-value sites and making the next call site safe by default; the
  escape_value flag is gone, call sites pass raw values
- CLI: cli_client() bundles cli_api_errors + get_authenticated_client
  so a new command cannot take the client without the error handling;
  test patches retarget to the single package-level symbol
- _auth_recovery_hint uses the public get_auth_status().source instead
  of re-implementing source detection via a private credentials helper
- n1 shims: install_shim() derives the target from the shim's own
  __name__ (13 files shrink to two lines each; a copy-pasted shim can
  no longer alias the wrong module); _SUBMODULES is derived from the
  files on disk via pkgutil, dropping the hand-list and the magic-13
  test assertion
- tests: a module-level autouse fixture in test_install_flow.py stubs
  the MCP/skills installs for every test — the dangerous npx path is
  now unreachable by default instead of guarded by per-test
  boilerplate (4 copies removed); direct-exercise tests are unaffected
  (import-time bindings)
- test_packaging builds once via plain `python -m build` (sdist +
  wheel-from-sdist) instead of two separate invocations — halves the
  package-job build work and covers the more realistic path
- test_http: non-JSON/redirect tests folded into TestHandleResponse;
  dead _all_scripts helper removed from test_bash_scripts
- comments: N/A guard documented as a version-skew defense, APIError
  docstring covers unusable responses, alias_module_contents documents
  copy (non-aliasing) monkeypatch semantics

Skipped (noted, not bugs): exit-code contract to replace
AUTH_FAILURE_MARKERS grepping (behavior change — follow-up),
payload incremental size accounting (pre-existing structure),
negligible micro-costs on error/debug paths.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@dhruvbatra dhruvbatra merged commit 4b4d24b into main Jun 10, 2026
9 checks passed
@dhruvbatra dhruvbatra deleted the fix/codebase-review-findings branch June 10, 2026 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant