[codex] Upgrade Agent Studio workbench by ColinLi98 · Pull Request #3 · ColinLi98/narrativeos_codex_handoff

ColinLi98 · 2026-05-10T18:02:05Z

What changed

Upgrades Agent Studio into the interactive local workbench: startup page, reader-first Studio layout, director panel, choice cards, branch map, quality/export status, and .nosbook export support.
Adds ReaderShell async generation hardening for queued /v1/reader/continue responses with job polling, resume-on-stale, session reload, and visible wait state.
Adds Agent Studio rendered smoke coverage, desktop/mobile screenshot artifacts, sticky director checks, mobile bounded choice scroll checks, and visual review checklist output.
Adds scripts/run_agent_studio_local.sh, which starts the local backend and automatically opens /app?product=author&workspace=studio&debug=1 after /health passes. Set AGENT_STUDIO_OPEN_BROWSER=0 to disable browser opening.
Refreshes the standard cross-pack benchmark baseline after CI showed the initial-import baseline was stale while current benchmark quality passes at 1.000.

Merge gate evidence

Lane: Lane B
Phase: Phase 2
Task: Task AS-4/AS-5 follow-up hardening
Goal met: yes
Out-of-scope changes introduced: no
Tests run: targeted Agent Studio/ReaderShell/frontend smoke contract tests, targeted AuthorWork branch/export tests, cross-pack merge gate tests, and local rendered Agent Studio smoke
Benchmark / eval run: standard cross-pack benchmark plus merge gate after baseline refresh
strongest pack delta: current strongest remains synthetic_min_pack and urban_mystery_lotus_lane after baseline refresh
weakest pack delta: current weakest remains jade_court_romance, tide_archive_memory_debt, and xianxia_forgotten_vow after baseline refresh
cross-pack pass-rate delta: +0.000 against refreshed baseline; prior stale baseline showed +0.067 before refresh
issue category delta (Q03/Q04/Q05/Q09 if relevant): no generation/planner changes; benchmark phase gate reports no blocking issue-category regression
rollback point: revert the PR branch commits, especially the Agent Studio UI/smoke changes and refreshed tests/benchmark_baseline.json
next suggested task: run long-route benchmark evidence after Agent Studio PR lands

Product impact

Does this move commercialization forward?: yes, it makes Author-side local creation usable and CI-verifiable.
Does this improve kernel/product/ops instead of just current-pack polish?: yes, changes are product shell, smoke reliability, PR review process, and benchmark evidence hygiene.
Does this make weakest packs easier to diagnose or improve?: yes, cross-pack benchmark and visual smoke artifacts now surface current weakest/strongest packs and UI regressions clearly.

Validation

bash -n scripts/run_agent_studio_local.sh scripts/run_agent_studio_smoke.sh scripts/run_frontend_shell_smoke.sh scripts/run_reader_shell_smoke.sh
node -c src/narrativeos/web/agent_studio.js && node -c src/narrativeos/web/reader_shell_v2.js && node -c scripts/verify_agent_studio_smoke.js && node -c scripts/verify_frontend_shell_smoke.js
.venv/bin/python -m py_compile scripts/write_agent_studio_smoke_step_summary.py scripts/write_frontend_shell_smoke_step_summary.py
.venv/bin/python -m pytest tests/test_agent_studio_interactive_workbench.py -q
.venv/bin/python -m pytest tests/test_frontend_shell_smoke_ci.py::test_frontend_shell_smoke_scripts_exist_and_are_parseable tests/test_frontend_shell_smoke_ci.py::test_agent_studio_smoke_workflow_wires_headless_runner_and_artifacts tests/test_frontend_shell_docs.py -q
.venv/bin/python -m pytest tests/test_reader_shell_v2.py -q
.venv/bin/python -m pytest tests/test_reader_shell_flow.py -q
.venv/bin/python -m pytest tests/test_author_works.py::test_author_work_flow_supports_generate_edit_diagnostics_and_submit tests/test_author_works.py::test_author_work_can_create_parallel_universe_branch_without_overwriting_mainline tests/test_author_works.py::test_author_work_branch_discards_mainline_future_chapters_after_selected_fork_point -q
/tmp/narrativeos-py312-venv/bin/python -m pytest tests/test_cross_pack_merge_gate.py tests/test_cross_pack_benchmark.py::test_cross_pack_benchmark_outputs_kernel_metrics tests/test_phase0_guardrails.py -q
/tmp/narrativeos-py312-venv/bin/python -m src.narrativeos.benchmark.runner --baseline-file tests/benchmark_baseline.json --database-url sqlite:///narrativeos_beta.db --markdown-out /tmp/pr3-benchmark-summary-updated.md > /tmp/pr3-benchmark-updated.json
/tmp/narrativeos-py312-venv/bin/python -m src.narrativeos.benchmark.merge_gate --benchmark-file /tmp/pr3-benchmark-updated.json --summary-out /tmp/pr3-merge-gate-summary-updated.md
CI_HEADLESS=1 APP_PORT=8018 CHROME_PORT=9238 CHROME_USER_DIR=/tmp/narrativeos-chrome-agent-studio scripts/run_agent_studio_smoke.sh
APP_PORT=8766 AGENT_STUDIO_OPEN_BROWSER=0 bash scripts/run_agent_studio_local.sh against a temporary /health stub, confirming the script emits the Studio URL.

Notes

Full tests/test_author_works.py was attempted but did not finish locally and was stopped after repeated high-CPU runs; the Agent Studio-relevant AuthorWork subset above passed.

ColinLi98 · 2026-05-12T00:54:35Z

Agent Studio visual review accepted from artifacts/agent_studio_smoke_visual_review.md in the latest PR run (25705903844).

Viewport	Check	Status	Evidence	Reviewer note
desktop	Three-column workbench review	manual_review	artifacts/agent_studio_smoke_desktop.png	accepted
mobile	Stacked workbench review	manual_review	artifacts/agent_studio_smoke_mobile.png	accepted

ColinLi98 added 16 commits May 10, 2026 19:01

Upgrade Agent Studio workbench

44ba3d7

Fix Agent Studio smoke artifact upload

e9ab604

Restore frontend smoke paid chapter helper

0b2af09

Harden smoke artifact reporting

3fbf173

Fix cross-pack quality test fixtures

b09b3aa

Fix author collaboration API auth tests

7cb1e50

Trim heavy author simulate test runtime

22150e1

Fix CI auth and smoke contracts

7cb7853

Fix remaining CI auth and contract tests

db27159

Fix assisted gate publish fixture

709269d

Harden Agent Studio generation smoke retries

ffd553d

Fix remaining ops auth endpoint tests

33a821e

Restore frontend shell client fixture

8667d27

Refresh cross-pack benchmark baseline

3349097

Avoid duplicate cross-pack PR branch runs

ddeefde

Scope cross-pack workflow tests to quality contracts

18e17e5

ColinLi98 marked this pull request as ready for review May 12, 2026 00:54

ColinLi98 and others added 2 commits May 13, 2026 15:37

Fix local Agent Studio launch flow

9b8d274

Harden GitHub Agent Studio local usability

8cc6022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Upgrade Agent Studio workbench#3

[codex] Upgrade Agent Studio workbench#3
ColinLi98 wants to merge 18 commits into
mainfrom
codex/agent-studio-workbench-local-launch

ColinLi98 commented May 10, 2026 •

edited

Loading

Uh oh!

ColinLi98 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ColinLi98 commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Merge gate evidence

Product impact

Validation

Notes

Uh oh!

ColinLi98 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ColinLi98 commented May 10, 2026 •

edited

Loading