Fix full-codebase review findings: hermetic tests, py.typed, CLI error handling, installer hardening, CI by dhruvbatra · Pull Request #129 · yutori-ai/yutori-sdk-python

dhruvbatra · 2026-06-10T19:16:01Z

Fixes every issue from a full-codebase review (Claude Code review with four area subagents, cross-reviewed by Codex — including the two issues Codex found that the original review missed, and one issue Codex found in this PR's own escaping path).

Highest impact

Unit tests executed real `npx` installs against `$HOME` — four __install_flow tests ran unmocked npx add-mcp -y -g ... / npx skills add ... -g, rewriting ~/.cursor/mcp.json, ~/.codex/config.toml, ~/.gemini/settings.json on any machine (or CI runner) with npx. Now stubbed like their sibling tests; full suite runs in ~4s with zero $HOME writes.
py.typed was missing despite the Typing :: Typed classifier — type checkers ignored every annotation in the installed package. Added and asserted in the wheel-content test.
Committed install.sh was stale (0.7.7 header at 0.7.8) — and yutori.com/install.sh serves this file, so it was live. Regenerated; the pre-commit freshness hook now also triggers on pyproject.toml, and new CI enforces it.
No CI existed; publishing was ungated — added a CI workflow (ruff, pytest on 3.9–3.13, wheel-content test, twine check, installer freshness + bash -n); the publish workflow now runs tests and twine check before upload, and the release-asset action is pinned to a commit SHA.

Behavior changes

yutori browse run / research run / scouts create now exit 1 when the API reports status: failed (previously exit 0 — scripts treated rejected tasks as success). Documented in api.md.
CLI data commands print one-line errors (exit 1) for AuthenticationError / APIError / network failures instead of multi-screen tracebacks; rejected keys get "Run 'yutori auth login'" guidance. AUTH_FAILURE_MARKERS updated in lockstep.
401/403 SDK errors include the status code and server response body; 3xx responses raise APIError instead of silently returning {}.
API keys are stripped on resolution — a trailing-newline key (e.g. export YUTORI_API_KEY=$(cat key)) no longer fails every request deep in httpx.

Correctness fixes

Rich markup injection: scout queries like watch [/b] releases crashed scouts list with MarkupError; [beta]-style tokens were silently deleted; the installer summary could crash after a successful install. All user/API/subprocess text rendered through Rich is now escaped (and stringified first — non-string API values also crashed).
yutori.n1 compat shims: yutori.n1.payload attribute access and from yutori.n1.page_ready import SupportsAsyncPageReady work again (star-import shims dropped submodule attributes and non-__all__ names).
Payload trimming converges: a message holding several screenshots could stay over max_bytes no matter how many passes ran; trimming now drains multi-image messages (and extra images sharing the last message).
Replay robustness: valid-JSON non-object tool arguments no longer crash HTML generation; screenshot URLs are HTML-escaped; a text-only user interjection no longer blanks the pending tool screenshot.
apply_chat_extra_body no longer mutates the caller's extra_body dict; install-flow Ctrl+C during verification polling renders the summary and honors the documented exit-code contract; Task ID: N/A is treated as missing instead of polling browse get N/A; Windows venvs resolve Scripts\python.exe; EADDRINUSE detected by errno; failed save_config after key generation reports the orphaned key.

Packaging / installer

setuptools>=77 build floor (PEP 639 string license fails validation on 67–76).
MANIFEST.in grafts tests/ so the sdist suite can collect (it shipped test_*.py without conftest.py/helpers).
Installer frame-render dir uses mktemp -d (predictable /tmp path was symlink-clobberable on shared Linux); uninstall.sh accepts yes, not just y.
__version__ fallback is 0.0.0+unknown instead of a plausible-but-wrong 0.1.0.

Cleanup

Internal tracker/hostname references removed from comments; stale comments about a removed stdout guard (and their no-op isatty patches) removed; examples use the canonical NAVIGATOR_N1_MODEL; two test files gained __future__ imports their PEP 604 annotations needed on 3.9; the packaging test builds from a temp copy instead of deleting the repo's build/.

Verification

421 passed, 3 skipped (shellcheck not installed locally) — includes 17 new regression tests pinning these fixes
ruff check . clean; bash -n on all installer scripts
Wheel inspected: yutori/py.typed + all 8 JS assets present; sdist inspected: full test suite included; twine check PASSED on both
install.sh regeneration verified idempotent; pinned action SHA verified upstream

Not included (needs external setup): OIDC trusted publishing requires configuring the publisher on PyPI first; the workflow keeps the token until then.

🤖 Generated with Claude Code

Note

Medium Risk
CLI exit-code changes can break scripts that assumed success on rejected creates; auth and HTTP error handling changes affect all API clients, while CI/publish gates reduce the risk of shipping broken releases.

Overview
This PR bundles review-driven fixes across CI/publish, CLI/SDK behavior, packaging, and shell installers, bumping to 0.7.8 with a regenerated install.sh.

Automation: New CI runs ruff, pytest on Python 3.9–3.13, slow wheel/sdist checks, twine check, and install.sh freshness + bash -n. Publish now runs tests and metadata validation before upload; the GitHub release action is pinned to a commit SHA. Pre-commit also retriggers install.sh regen when pyproject.toml version changes.

CLI (breaking for scripts): browse run, research run, and scouts create exit 1 on API status: failed or missing task ID. Commands use shared cli_client / safe_str so API, auth, and network errors print short messages (no tracebacks) and user/API strings don’t break Rich markup.

SDK/auth: Keys are stripped at resolve time; 401/403 include status and body; 3xx and 2xx non-JSON raise APIError instead of {}. OAuth bind errors use errno; failed save_config after key creation reports an orphaned key. Auth API base URL respects YUTORI_API_BASE_URL.

Navigator / compat: Payload trimming drains multi-image messages; replay HTML escapes screenshot URLs and keeps tool screenshots when a text-only user message follows. yutori.n1 shims use install_shim / lazy submodules so submodule attributes match pre-rename imports; examples use NAVIGATOR_N1_MODEL.

Packaging/installer: Ships py.typed, MANIFEST.in grafts full tests/, setuptools>=77. Installer frame cache uses mktemp -d; uninstall.sh accepts yes. Install-flow tests stub MCP npx installs; verification handles Ctrl+C, Task ID: N/A, and Windows venv Scripts\python.exe.

^{Reviewed by Cursor Bugbot for commit 0b1e0ba. Bugbot is set up for automated code reviews on this repo. Configure here.}

Tests / safety: - Stub MCP/skills installs in the four install-flow tests that executed real `npx add-mcp` / `npx skills add` against $HOME - Make the wheel-content test build from a temp copy instead of deleting the repo's build/ dir; stringify API fields before Rich-escaping in `scouts list` (non-string values no longer crash the CLI) SDK: - handle_response: treat 3xx as an error (redirects are not followed); include status + server body in 401/403 AuthenticationError messages - apply_chat_extra_body: stop mutating the caller's extra_body dict - Strip resolved API keys (trailing-newline keys no longer fail deep in httpx with LocalProtocolError) - Auth flow: detect EADDRINUSE by errno; report the orphaned key when save_config fails after key generation; YUTORI_API_BASE_URL override - __version__ fallback 0.1.0 -> 0.0.0+unknown yutori.n1 compat: - Shims now alias full module contents (non-__all__ names importable again) and the package exposes submodules as attributes Navigator: - Payload trimming converges on multi-image messages - Replay: survive valid-JSON non-object tool arguments, escape screenshot URLs in generated HTML, keep the pending tool screenshot across text-only user interjections - _assets: chained joinpath for py3.9 zip-import compatibility CLI: - Friendly errors (exit 1, no traceback) for AuthenticationError / APIError / httpx errors across data commands, with AUTH_FAILURE_MARKERS kept in sync - browse/research run and scouts create exit 1 when the API reports status=failed - Escape user/API/subprocess text rendered through Rich markup - Drop the dead auth-URL reprint after login failure; treat "N/A" task id as missing in verification; Windows venv Scripts/ path; Ctrl+C during verification polling now renders the summary and honors the exit-code contract Packaging / installer / CI: - Ship yutori/py.typed (Typing :: Typed was previously a no-op for consumers); setuptools>=77 floor to match the PEP 639 license syntax; MANIFEST.in grafts tests/ so the sdist suite can collect - install.sh regenerated (0.7.8 header); frame render dir uses mktemp -d (predictable /tmp path TOCTOU); uninstall.sh accepts "yes"; pre-commit freshness hook now also triggers on pyproject.toml - New CI workflow (ruff + pytest on 3.9-3.13 + wheel content test + twine check + install.sh freshness); publish workflow gates on tests and twine check; release action pinned to a commit SHA - Add __future__ imports to two test files whose PEP 604 annotations would crash collection on Python 3.9 Cleanup: internal tracker/hostname references removed from comments, stale redirected-stdout-guard comments and no-op isatty patches removed, examples use NAVIGATOR_N1_MODEL, api.md documents the new exit codes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

SC2016 is info-level and flags the intentional literal backticks in uninstall.sh's single-quoted grep pattern and user-facing message. The test always skipped locally (no shellcheck) and only started running on the new CI's ubuntu runners, which ship shellcheck preinstalled. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Error handling (silent-failure review): - handle_response: a 2xx non-JSON body (captive portal / SSO gateway) now raises APIError instead of escaping as a bare JSONDecodeError that bypassed cli_api_errors; 3xx errors name the redirect Location; 401/403 body detail truncated to 500 chars - print_task_submission_result: a non-failed create response without a task_id now fails (red banner, exit 1) instead of printing a green "submitted" banner for an unpollable task; the installer's "N/A" guard becomes defense-in-depth - cli_api_errors: auth recovery hint tailored to the rejected key's source (env var vs saved config vs none) - run_login_flow: non-EADDRINUSE bind errors name the callback server host:port; orphaned-key message points at the settings page - yutori.n1: PEP 562 __dir__ so lazy submodules stay discoverable Comment accuracy (comment review): - chat namespace docstrings no longer suggest passing max_retries through create() (not accepted there); retry policy documented as not configurable via the SDK surface - credentials: trailing-newline example replaced ($(cat file) strips trailing newlines and cannot reproduce the failure) - n1 __init__ comment no longer overstates pre-rename eager imports or hardcodes the shim count; payload trimming comments/docstring state the per-phase single-strip constraint and best-effort protection - release action pin annotated with machine-readable trailing # v2.6.2 Test gaps (test-coverage review): - test_port_in_use now constructs OSError with errno so the EADDRINUSE branch is actually exercised; companion fallthrough test - scouts create failed-status exit code, missing-task-id failure, installer prompt/summary markup escaping, Windows Scripts venv branch (via an os proxy), YUTORI_API_BASE_URL override (module reload), new "invalid api key" auth marker, full n1 submodule set + dir(), positive escaped-URL assertion in the replay XSS test, mktemp -d and yes-regex pins for the shell scripts, and sdist content assertions (conftest/helpers/py.typed) in the packaging test Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Quality pass over the review-fix PR (reuse / simplification / efficiency / altitude), no behavior changes: - CLI: escape-by-default in the shared print helpers (safe_str) — print_optional_field/print_aligned_fields/print_creation_result now escape values unconditionally, closing the six remaining unescaped API-value sites and making the next call site safe by default; the escape_value flag is gone, call sites pass raw values - CLI: cli_client() bundles cli_api_errors + get_authenticated_client so a new command cannot take the client without the error handling; test patches retarget to the single package-level symbol - _auth_recovery_hint uses the public get_auth_status().source instead of re-implementing source detection via a private credentials helper - n1 shims: install_shim() derives the target from the shim's own __name__ (13 files shrink to two lines each; a copy-pasted shim can no longer alias the wrong module); _SUBMODULES is derived from the files on disk via pkgutil, dropping the hand-list and the magic-13 test assertion - tests: a module-level autouse fixture in test_install_flow.py stubs the MCP/skills installs for every test — the dangerous npx path is now unreachable by default instead of guarded by per-test boilerplate (4 copies removed); direct-exercise tests are unaffected (import-time bindings) - test_packaging builds once via plain `python -m build` (sdist + wheel-from-sdist) instead of two separate invocations — halves the package-job build work and covers the more realistic path - test_http: non-JSON/redirect tests folded into TestHandleResponse; dead _all_scripts helper removed from test_bash_scripts - comments: N/A guard documented as a version-skew defense, APIError docstring covers unusable responses, alias_module_contents documents copy (non-aliasing) monkeypatch semantics Skipped (noted, not bugs): exit-code contract to replace AUTH_FAILURE_MARKERS grepping (behavior change — follow-up), payload incremental size accounting (pre-existing structure), negligible micro-costs on error/debug paths. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

dhruvbatra and others added 4 commits June 10, 2026 12:15

dhruvbatra merged commit 4b4d24b into main Jun 10, 2026
9 checks passed

dhruvbatra deleted the fix/codebase-review-findings branch June 10, 2026 21:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix full-codebase review findings: hermetic tests, py.typed, CLI error handling, installer hardening, CI#129

Fix full-codebase review findings: hermetic tests, py.typed, CLI error handling, installer hardening, CI#129
dhruvbatra merged 4 commits into
mainfrom
fix/codebase-review-findings

dhruvbatra commented Jun 10, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dhruvbatra commented Jun 10, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Highest impact

Behavior changes

Correctness fixes

Packaging / installer

Cleanup

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dhruvbatra commented Jun 10, 2026 •

edited by cursor Bot

Loading