feat: surface prompt-processing state and unblock TUI startup#117
Conversation
Previously users saw a frozen UI during two separate silent gaps: 1. Between hitting Enter and the LLM stream starting — during local prep (onBeforeTurn, measureUsage, compaction) there was no spinner at all. 2. On first launch the startup token probe blocked the input loop for 200ms–3s while it measured system-prompt + tool-schema token overhead. This change: - Adds onStreamStart and onFirstStreamPart hooks to the harness LoopHooks so any agent runtime can signal the exact prompt-processing window. - In the TUI, shows a 'Processing...' loader during turn preparation and transitions to 'Working...' once the LLM request is in flight (cleared on the first visible stream part). - In headless, emits a 'turn-start' lifecycle annotation and a matching onStreamStart callback; extends the TrajectoryEvent union accordingly. - Runs the TUI startup measureUsage probe fire-and-forget so the editor accepts input immediately; the context-usage footer starts on the estimated value and quietly upgrades to the real one when ready. Documents the new turn-start event in the headless AGENTS.md / README.
|
Important Review skippedAuto reviews are disabled on this repository. To trigger a review, include ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 Walkthrough워크스루여러 패키지에 걸쳐 좌표화된 패치를 도입하여 이전에 정지된 것처럼 보였던 프롬프트 처리/턴 수명주기 동작을 표면화하고 수정합니다. 변경사항
시퀀스 다이어그램sequenceDiagram
participant Client
participant Harness as Harness<br/>(runAgentLoop)
participant Stream as Agent<br/>Stream
participant Observer as Observer<br/>(onStreamStart,<br/>onFirstStreamPart)
participant ProcessLoop as Stream<br/>Processing Loop
Client->>Harness: runAgentLoop(...)
Harness->>Stream: agent.stream()
Stream-->>Harness: stream created
Harness->>Observer: onStreamStart(context)
Observer-->>Harness: [hook completes]
Harness->>ProcessLoop: start fullStream iteration
ProcessLoop->>Stream: await first chunk
Stream-->>ProcessLoop: part[0]
ProcessLoop->>Observer: onFirstStreamPart(part[0], context)
Observer-->>ProcessLoop: [hook completes]
ProcessLoop->>ProcessLoop: continue iteration
ProcessLoop->>Stream: await next chunk
Stream-->>ProcessLoop: part[1..n]
ProcessLoop->>ProcessLoop: process all parts
ProcessLoop-->>Harness: iteration complete
Harness-->>Client: loop result
sequenceDiagram
participant Client
participant Headless as Headless<br/>(runHeadless)
participant Collector as Trajectory<br/>Collector
participant Stream as Agent<br/>Stream
participant JSONL as JSONL<br/>Emitter
participant File as trajectory.json<br/>(ATIF v1.4)
Client->>Headless: runHeadless(config)
Headless->>Stream: agent.stream()
Stream-->>Headless: stream created
Headless->>JSONL: emit TurnStartEvent
JSONL->>Collector: collect turn-start
Collector->>Collector: [DROP - transient]
JSONL-->>Client: stdout JSONL
Headless->>Headless: processStream()
Headless->>JSONL: emit step events
JSONL->>Collector: collect steps
Collector->>Collector: [PERSIST]
Headless->>Collector: finalize()
Collector->>Collector: aggregate metrics<br/>(total_cost_usd)
Collector->>File: write trajectory.json<br/>(schema_version: v1.4)
File-->>Client: persisted ATIF v1.4
예상 코드 리뷰 노력🎯 4 (복잡) | ⏱️ ~45분 시
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces "prompt processing" indicators across the harness, TUI, and headless packages to improve user feedback during LLM request latency. It adds onStreamStart and onFirstStreamPart hooks to the agent loop and a new turn-start event for headless mode. In the TUI, the startup token probe is now non-blocking. Feedback includes addressing a race condition in the TUI's background startup probe and ensuring that the onStreamStart hook and related UI states are triggered after the agent stream is dispatched for consistency and accurate latency signaling.
| }; | ||
|
|
||
| const runBackgroundStartupProbe = (): void => { | ||
| measureUsageIfAvailable([]).then(ignore, ignore); |
There was a problem hiding this comment.
Running measureUsageIfAvailable([]) as a fire-and-forget background task at startup introduces a race condition. If the user quickly starts a turn and that turn's usage probe (which is awaited) finishes before this background probe, the background probe will eventually resolve and overwrite the messageHistory with stale usage data for an empty message list. This would result in an incorrect token count being displayed in the UI footer.
Consider adding a tracking mechanism (e.g., a sequential ID or a timestamp) to measureUsageIfAvailable to ensure that only the result of the most recently requested probe is applied to the state.
| emitAndCollect({ | ||
| type: "turn-start", | ||
| phase, | ||
| timestamp: new Date().toISOString(), | ||
| }); | ||
| await config.onStreamStart?.(phase); | ||
|
|
||
| const streamPromise = Promise.resolve( | ||
| config.agent.stream(streamOptions) | ||
| ); |
There was a problem hiding this comment.
The onStreamStart hook and turn-start event are currently emitted before the agent stream is dispatched. According to the LoopHooks documentation and the implementation in runAgentLoop, this hook should fire immediately after agent.stream() is invoked. This ensures that the "Working..." or "turn-start" state accurately reflects the period where the request is in flight, and avoids triggering the hook if agent.stream() itself throws an error during initialization.
| emitAndCollect({ | |
| type: "turn-start", | |
| phase, | |
| timestamp: new Date().toISOString(), | |
| }); | |
| await config.onStreamStart?.(phase); | |
| const streamPromise = Promise.resolve( | |
| config.agent.stream(streamOptions) | |
| ); | |
| const streamPromise = Promise.resolve( | |
| config.agent.stream(streamOptions) | |
| ); | |
| emitAndCollect({ | |
| type: "turn-start", | |
| phase, | |
| timestamp: new Date().toISOString(), | |
| }); | |
| await config.onStreamStart?.(phase); |
| showLoader("Working..."); | ||
| await config.onStreamStart?.(phase); |
There was a problem hiding this comment.
The onStreamStart hook and the "Working..." loader state are triggered before the agent stream is dispatched. For consistency with the harness loop and to accurately signal the start of the LLM request latency gap (TTFB), these should be moved after the config.agent.stream call. This also prevents the "Working..." state from being entered if the stream creation itself fails.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94712f580d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| }; | ||
|
|
||
| const runBackgroundStartupProbe = (): void => { | ||
| measureUsageIfAvailable([]).then(ignore, ignore); |
There was a problem hiding this comment.
Prevent stale startup probe from clobbering usage state
Launching measureUsageIfAvailable([]) in the background creates a race with the first real turn’s measureUsageIfAvailable(messages) call, and both paths write to messageHistory.updateActualUsage(...) without any revision guard. If the startup probe resolves last, it can overwrite newer token usage with the empty-message baseline, which underestimates context pressure and can mis-size maxOutputTokens/compaction decisions until another measurement occurs.
Useful? React with 👍 / 👎.
| type: "turn-start", | ||
| phase, | ||
| timestamp: new Date().toISOString(), | ||
| }); | ||
| await config.onStreamStart?.(phase); |
There was a problem hiding this comment.
Emit turn-start after dispatching the stream request
turn-start is emitted (and onStreamStart awaited) before config.agent.stream(streamOptions) is invoked, so failures during stream creation still produce a false "request started" lifecycle event. This breaks the event contract documented as "after dispatch" and can skew downstream latency/phase metrics for consumers that treat turn-start as the network dispatch boundary.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
2 issues found across 8 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/headless/src/runner.ts">
<violation number="1" location="packages/headless/src/runner.ts:673">
P2: `turn-start` is emitted before `agent.stream()` dispatch, so lifecycle events can claim a stream started when it hasn't.</violation>
</file>
<file name="packages/headless/README.md">
<violation number="1" location="packages/headless/README.md:185">
P3: Document the actual sequencing: `turn-start` is emitted before `agent.stream()` is called, not after it is dispatched.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Post-implementation audit surfaced four issues and two gaps; all are addressed here to keep the PR self-contained. - BUG A (headless): turn-start was re-emitted on every retry, so a single logical turn could produce 2-4 turn-start events (overflow retry + up to 3 no-output retries). Added a hasEmittedTurnStart flag in runSingleTurn scope so the event fires at most once per logical turn; flag resets naturally on the next turn. - BUG B (TUI): during a blocking compaction the foreground loader stayed on 'Processing...' while the real wait was the compaction LLM call. The onBlockingChange callback now swaps the foreground label to 'Compacting...' on block entry and restores the previous label on block exit, so users see the real reason for the wait. - BUG C (TUI): text-start stream parts were not counted as visible, leaving the loader spinning after the empty AssistantStreamView had already mounted. text-start is now visible; reasoning-start follows the existing showReasoning flag. - BUG D (harness): documented that LoopHooks.onStreamStart fires only under runAgentLoop; the TUI has its own independent hook on AgentTUIConfig with a different signature. - GAP 1 (tests): added retry-path assertions that turn-start fires exactly once, plus a new test that confirms normal-path ordering (metadata -> user step -> turn-start -> agent step) and an intermediate-step phase test for tool-continuation turns. - GAP 1b (tests): added a test that turn-start events never appear in the persisted trajectory.json even though they stream on JSONL. - GAP 2 (docs): updated packages/cea/benchmark/AGENTS.md event table and flow diagram to include turn-start and note that it is not persisted.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 87450f1082
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| await config.onSetup?.(); | ||
| await measureUsageIfAvailable([]); | ||
| updateHeader(); | ||
|
|
||
| runBackgroundStartupProbe(); | ||
|
|
There was a problem hiding this comment.
Refresh header after onSetup completes
updateHeader() now runs only before onSetup, and this block immediately starts the background probe without a post-setup refresh. If onSetup asynchronously initializes header/footer state (for example, setting a resolved model subtitle), the rendered header stays stale until an unrelated later render (such as a usage probe update or first turn), so startup metadata can be wrong for the entire idle session.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
1 issue found across 7 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/tui/src/agent-tui.ts">
<violation number="1" location="packages/tui/src/agent-tui.ts:931">
P2: When blocking compaction ends you always restore the stashed foreground loader, even if `clearStatus()` already removed it (e.g., once the first stream part arrived). That resurrects a stale “Processing…” spinner and makes the UI look frozen again. Guard the restoration (and clear the stash) when the loader was cleared during the block.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
The persisted trajectory was tagged 'ATIF-v1.6', which does not exist on Harbor's public spec. The current Harbor ATIF version is v1.4 (see https://www.harborframework.com/docs/agents/trajectory-format): v1.0, v1.1, v1.2, v1.3, v1.4 — v1.5 and v1.6 were never released. The 'v1.6' label was an internal bump accompanying a redesign of the JSONL event types on 2026-04-03; it conflated two distinct surfaces: - The ATIF trajectory that TrajectoryCollector writes to disk, which must follow Harbor's published schema (v1.4). - The internal stdout JSONL protocol used by the headless runner, which carries lifecycle annotations (approval, compaction, interrupt, turn-start) that ATIF does not define. This commit: - Sets schema_version to 'ATIF-v1.4' in TrajectoryCollector (both the TypeScript type literal and the runtime value). - Updates the Python validator (test_trajectory.py), Python scorer, CLI help text, and test assertions to expect 'ATIF-v1.4'. - Rewrites packages/headless/AGENTS.md, README.md, and packages/cea/benchmark/AGENTS.md to separate the ATIF persisted format from the internal JSONL streaming protocol, with a pointer to Harbor's spec page. - Adds a JSDoc header on TrajectoryEvent types explaining the split.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 49cd7261de
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (foregroundStatusMessage !== null) { | ||
| foregroundStatusBeforeBlocking = foregroundStatusMessage; | ||
| showLoader("Compacting..."); |
There was a problem hiding this comment.
Capture prior spinner text only once per blocking compaction
onBlockingChange can emit multiple blocking: true stages for a single compaction (starting → pruning → compacting from CompactionOrchestrator.blockAtHardLimit/handleOverflow), but this block overwrites foregroundStatusBeforeBlocking every time. After the first stage, the foreground text is already "Compacting...", so later stages replace the original "Processing..."/"Working..." snapshot, and the unblock path restores the wrong label. This breaks the intended status restoration and leaves users with stale compaction wording after blocking ends.
Useful? React with 👍 / 👎.
…validator After correcting the schema_version label to 'ATIF-v1.4', audit against the official spec (https://www.harborframework.com/docs/agents/trajectory-format) surfaced three remaining gaps. This commit closes them. - StepMetrics now exposes the full v1.4 optional surface: - logprobs: number[] - prompt_token_ids: number[] (added in v1.4) - completion_token_ids: number[] (added in v1.3) These remain undefined unless the provider populates them, so existing output is bit-for-bit identical for callers that don't set the new fields. - TrajectoryJson.final_metrics now aggregates total_cost_usd across step metrics with the same null-when-absent semantics as the other token totals. The ATIF compliance test suite already expected this field in ATIF_FINAL_METRICS_ALLOWED_FIELDS; the emission side just caught up to the test. - TrajectoryJson.extra is typed as the approval/compaction/interrupt container intersected with Record<string, unknown> so downstream additions to extra.* do not require breaking type changes. - packages/cea/benchmark/test_trajectory.py now: - Calls harbor.utils.trajectory_validator.TrajectoryValidator when the harbor package is importable (auto-skips when Harbor is not installed so local developer workflows keep working). - Enforces per-step metric shapes: numeric fields (prompt_tokens, completion_tokens, cached_tokens, cost_usd) must be numbers; id fields (logprobs, prompt_token_ids, completion_token_ids) must be arrays. - Prints total_cost_usd in the summary block and reports whether the Harbor validator was used or skipped. - New tests: - finalize() aggregates total_cost_usd across step metrics - finalize() returns null total_cost_usd when no step reports cost - finalize() preserves all v1.4 optional metric fields (logprobs, prompt_token_ids, completion_token_ids) Verified: typecheck, ultracite check, full vitest (569 tests), and turbo build all pass.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 52acdd4be9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| await config.onStreamStart?.(phase); | ||
|
|
||
| const stream = await config.agent.stream( |
There was a problem hiding this comment.
Invoke onStreamStart after dispatching agent.stream
onStreamStart is awaited before config.agent.stream(...) is called, so consumers can observe a false "stream started" signal when stream creation fails or stalls. This is especially problematic for telemetry/hooks that treat this callback as the request-dispatch boundary, because it shifts latency measurements earlier and can report an in-flight state even when no request was actually sent.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
1 issue found across 5 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/cea/benchmark/test_trajectory.py">
<violation number="1" location="packages/cea/benchmark/test_trajectory.py:96">
P2: Per-step numeric metric validation also accepts booleans as numbers, allowing invalid metric payloads to pass.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Six issues raised across three AI review bots on PR #117: BUG (Gemini/Codex/Cubic - Medium): turn-start and onStreamStart fired before agent.stream() was invoked, breaking the documented contract 'fires immediately after agent.stream is invoked'. A stream-creation failure could still emit a false 'stream started' lifecycle event and skew telemetry for consumers that treat turn-start as the network dispatch boundary. Moved the emission/hook in both the headless runner and the TUI's runSingleStreamTurn to run after agent.stream() successfully returns. BUG (Gemini/Codex - High): Background startup usage probe raced with the first turn's per-turn probe; both paths wrote to messageHistory.updateActualUsage with no revision guard. A stale startup probe (based on an empty message list) could clobber newer usage data and underestimate context pressure, mis-sizing maxOutputTokens and compaction decisions. Added a generation token on measureUsageIfAvailable: probes whose generation is no longer current silently drop their result. BUG (Codex - Medium): onBlockingChange captures prior spinner text to restore after blocking compaction, but the orchestrator can emit multiple blocking:true stages for a single compaction (starting -> pruning -> compacting). The stash was being overwritten every stage, so by unblock time the stashed value was 'Compacting' itself. Only stash when foregroundStatusBeforeBlocking is null. BUG (Cubic - Medium): When the first stream part arrived during a blocking compaction, clearStatus() cleared foregroundStatusMessage, but the unblock path still called showLoader on the stashed value, resurrecting a stale 'Processing...' spinner on what should be a clean UI. Guard the restoration with a foregroundStatusMessage null check so a cleared loader stays cleared. BUG (Codex - Low): updateHeader() used to run right after onSetup to reflect any async header/footer state the consumer initialised there. When the startup probe became non-blocking, the updateHeader call was accidentally dropped. Restored it so header metadata renders before the idle session begins. BUG (Cubic - Low): Python validator accepted booleans as numbers for per-step and final metric values because isinstance(True, int) is True in Python. Added _is_real_number and _is_real_int helpers that explicitly exclude bool, matching ATIF-v1.4 expectations. New tests: - 'does not emit turn-start when agent.stream() rejects before dispatch' - 'emits turn-start after agent.stream() succeeds (before first chunk)' Verified: typecheck, ultracite check, 574 tests (+5 headless), build all pass. Every review-flagged issue has been fixed or explicitly addressed in the commit above or earlier commits on this branch.
Addressed all automated reviewer feedback (commit 245214f)Thanks to @gemini-code-assist, @chatgpt-codex-connector, and @cubic-dev-ai for catching these. Every flagged issue has been fixed. Resolved
Tests added
Verification
Re-request review when ready. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 245214f470
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| showLoader("Processing..."); | ||
|
|
||
| const preparedTurn = await prepareMessages(phase); |
There was a problem hiding this comment.
Guard processing spinner with finally cleanup
showLoader("Processing...") now runs before prepareMessages(phase), but the try/finally that calls clearStatus() starts later. If prepareMessages throws (for example from onBeforeTurn, measureUsage, or compaction checks), the function exits without hitting cleanup, so the foreground spinner remains stuck even though the turn failed and an error message is shown.
Useful? React with 👍 / 👎.
|
@cubic review |
There was a problem hiding this comment.
1 issue found across 16 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/tui/src/agent-tui.ts">
<violation number="1" location="packages/tui/src/agent-tui.ts:1561">
P1: `onStreamStart` callback failures currently break the turn flow; guard this hook so observer errors don’t abort a valid stream.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
An observer that throws from onStreamStart (or onFirstStreamPart) used to propagate the error and abort an otherwise-valid stream. This violates the 'observer-only' intent of the hooks and makes a buggy telemetry/logging callback strong enough to break a production turn. - harness loop.ts: introduce invokeObserverHook(hook, hookName, ctx). Errors are console.error-logged and swallowed so the stream iteration always proceeds. Applied to both onStreamStart and onFirstStreamPart. - headless runner.ts: wrap config.onStreamStart in try/catch with the same logging pattern. The turn-start emission itself stays in place, so lifecycle consumers are unaffected. - tui agent-tui.ts: wrap config.onStreamStart in try/catch; the 'Working...' loader and the rest of runSingleStreamTurn proceed even when the consumer's callback throws. - LoopHooks docstrings now document the 'observer-only contract': errors are swallowed after being logged. This is a public API guarantee so consumers can safely rely on it (and don't need to defensively wrap their callbacks). - New headless test: 'continues streaming when onStreamStart throws (observer errors are isolated)' confirms the stream completes, the turn-start event is still emitted, and the thrown error is logged via console.error. Also repairs a drop of LoopHooks.onToolCall from types.ts that had slipped through an earlier edit — loop.ts still destructured it, so typecheck was surfacing the regression now. Verified: typecheck, ultracite check, 575 tests (+1 headless), build all pass.
Addressed Cubic P1: observer-hook error isolation (commit d855e36)Thanks @cubic-dev-ai — a callback that throws from Fix
New test
Also repaired in the same commitAn earlier edit had accidentally dropped Verification
|
@minpeter The implementation in Key highlights from the fix:
This approach perfectly aligns with the intent of these hooks as non-blocking observers. For more details on managing these types of learnings or custom rules in the future, you can check the Cubic documentation. |
|
@cubic review |
Future edits to the headless trajectory pipeline must preserve ATIF v1.4 compliance. Until now the contract lived only in the changeset and the public README — easy to drift on in a hurry. This commit plants the contract directly on the load-bearing code and tests so a future maintainer cannot relax it by accident. Contract points now documented on-source: - trajectory-collector.ts module header: enumerates the six invariants (schema_version literal, step_id sequence, steps[*].source set, extra.* persistence rule, final_metrics null-when-absent, SDK-only metrics) and bounds the scope (persisted trajectory only — not the JSONL stream). - TrajectoryJson / AtifStep interface JSDocs: explain the spec-version bump discipline and that step-level fields require an ATIF v1.4 definition (or belong under extra). - runner.ts collectTrajectoryEvent: clarifies that the default-case drop is INTENTIONAL (non-ATIF types stay stream-only) and that new cases must ship with a matching extra.* path in finalize(). - runner.ts runHeadless: inline comment pinpoints where the two output surfaces (JSONL stdout vs ATIF trajectory.json) diverge, so the 'headless has two outputs' realisation does not need to be reconstructed every time. - types.ts TrajectoryEvent union: rewrites the one-line 'complete union' docstring into a contract that forces every new event type to pick 'extra.* persistence' or 'drop' at design time, and forbids promotion to a top-level ATIF field or steps[*].source value. - packages/headless/AGENTS.md: new 'ATIF v1.4 COMPLIANCE (persisted trajectory.json)' section listing the same invariants as the authoritative engineering contract. - atif-events.test.ts module header: declares the suite as the executable compliance contract with a pointer to the Python validator that must be updated in lock-step. No runtime behaviour changes. Verified: typecheck, ultracite check, 575 tests, build all pass.
|
@cubic review |
Second pass through the changeset against the full commit list surfaced
three small holes and one missing package header:
- 'plugsuits' (packages/cea) was missing from the changeset header. The
PR actually touches its user-facing CLI help text ('--atif' now
mentions ATIF-v1.4) and the benchmark validator pipeline, so cea
needs a patch bump in lock-step with the other packages.
- Added a dedicated CEA bullet describing the --atif wording change
and the Python validator pipeline improvements (Harbor validator
auto-call + stricter shape check + bool rejection).
- Credited the Cubic-reported bool-in-number bug in the 'PR review
feedback' section. The fix was already described under the ATIF v1.4
validator improvements bullet, but the review-attribution list had
dropped the sixth item.
- Noted that the fifth-commit fix quietly restored a dropped
LoopHooks.onToolCall declaration. Consumers that rely on this hook
had been running against a destructure with no matching type; this
is now type-safe again and the changeset mentions it so anyone
diffing public types sees why that line moved.
No code changes and no behaviour changes — the changeset content now
matches the actual commit range on this branch.
There was a problem hiding this comment.
1 issue found across 16 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/harness/src/types.ts">
<violation number="1" location="packages/harness/src/types.ts:173">
P2: Expose the current stream part in `onFirstStreamPart`, or remove the advice about filtering by part type. The callback only receives `LoopContinueContext`, so consumers cannot distinguish framing chunks from visible output as the docs promise.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
The docstring for LoopHooks.onFirstStreamPart told consumers to filter
on part type (to distinguish framing chunks like 'start' and
'text-start' from visible output), but the callback only received
LoopContinueContext — no part to filter on. The contract was broken at
the type level: the advice was physically impossible to follow.
This commit fixes the hook before adoption. Since onFirstStreamPart
was added earlier in the same PR and has zero consumers anywhere in
the monorepo (verified by exhaustive grep across packages/harness,
packages/tui, packages/headless, packages/cea, packages/minimal-agent,
packages/tgbot, scripts/, and the runtime/session adapter layer), the
signature change is type-only and requires no migration.
Changes:
- types.ts: onFirstStreamPart signature is now
(part: TextStreamPart<ToolSet>, context: LoopContinueContext)
=> void | Promise<void>
Docstring rewritten to describe what the hook actually does: fires
once per iteration on the very first part regardless of visibility;
the consumer decides what to do based on part.type but the hook does
not re-fire on a later visible part.
- types.ts: imports TextStreamPart from 'ai' (alongside the existing
ToolSet / ToolCallPart imports).
- index.ts: re-exports TextStreamPart from '@ai-sdk-tool/harness' so
consumers don't have to import it from 'ai' separately. This matches
how the TUI package already uses the exact type
(TextStreamPart<ToolSet>) for its isVisibleStreamPart filter.
- loop.ts: generalises invokeObserverHook to accept variadic args
(<Args extends readonly unknown[]>), so both (context)-only hooks
(onStreamStart) and (part, context) hooks (onFirstStreamPart) share
the same error-isolation helper. Call site at the for-await entry
now passes the current 'part' alongside the context.
- loop.test.ts: four new regression tests covering:
* first-part single-fire semantics (tool-call in iteration 0)
* per-iteration firing (two iterations, fires twice with correct
iteration numbers)
* empty-stream skip (no parts → hook never fires)
* observer-error isolation (thrown error is logged and swallowed,
stream iteration continues normally)
Oracle consulted before changing the public signature: recommended
Option A ('pre-adoption correction') over Option B (relax docstring)
given the zero-consumer inventory. The justification is that the hook
is one commit old and has not been released, so fixing it now is
materially different from breaking an established API in a patch
release.
Verified: typecheck (6/6), ultracite check (277 files clean), full
test suite (1235 tests, harness +4 new), build (6/6).
Addressed Cubic P2:
|
@minpeter The update in Key improvements in this fix:
This change completes the "prompt processing" visibility feature by making the underlying hooks actually usable for their intended purpose. For more on Cubic's review standards, you can refer to the Cubic documentation. |
|
@coderabbitai review |
|
@codex review |
|
@cubic-dev-ai review |
✅ Actions performedReview triggered.
|
@minpeter I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 91008d0191
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| showLoader("Processing..."); | ||
|
|
||
| const preparedTurn = await prepareMessages(phase); |
There was a problem hiding this comment.
Move processing spinner init inside cleanup scope
showLoader("Processing...") is invoked before prepareMessages, but clearStatus() only runs in the later try/finally. If prepareMessages throws (for example from onBeforeTurn, usage measurement, or compaction checks), execution exits before the finally, leaving the spinner stuck on screen even though an error is reported in processInput. This creates a persistent false "in progress" UI state after turn-prep failures.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
packages/headless/src/__tests__/atif-events.test.ts (1)
586-687: v1.4 신규 compliance 테스트 적절.
total_cost_usd합산 테스트에서toBeCloseTo(0.2, 10)로 부동소수점 오차 가드한 점 양호.total_cost_usdnull 케이스로 "비용 미보고 ≠ 0" 의미론을 명시화.logprobs/prompt_token_ids/completion_token_ids보존은toAtifStep의{ ...rest, metrics }스프레드 동작(trajectory-collector.ts:187-198)을 정확히 검증합니다.향후
metrics에 새 옵셔널 필드가 추가되면ATIF_STEP_ALLOWED_FIELDS화이트리스트 테스트는metrics내부 키까지는 검사하지 않으므로, v1.4 확장 필드의 whitelist 레벨 검증이 필요하면 별도 Set을 추가해 볼 여지가 있습니다(선택적).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/headless/src/__tests__/atif-events.test.ts` around lines 586 - 687, The tests for TrajectoryCollector (finalize()) correctly validate total_cost_usd aggregation and null behavior, and preservation of optional ATIF-v1.4 fields; no code change required—leave TrajectoryCollector.finalize, toAtifStep, and the ATIF_STEP_ALLOWED_FIELDS logic in trajectory-collector.ts as-is, but if you later need stricter whitelist checks for nested metrics keys add a separate Set for allowed metric keys and update toAtifStep to validate metrics against that Set.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/cea/benchmark/test_trajectory.py`:
- Around line 13-18: The helper _is_real_number currently only excludes bools so
float('nan') and float('inf') pass; update _is_real_number to also reject
non-finite floats by using math.isfinite (i.e., return isinstance(value, (int,
float)) and not isinstance(value, bool) and (math.isfinite(value) if
isinstance(value, float) else True)) and add the required import for math; leave
_is_real_int unchanged.
In `@packages/headless/AGENTS.md`:
- Around line 16-23: Update the documentation in AGENTS.md to correct the
lifecycle-annotation persistence behavior: change the paragraph so it states
that only `turn-start` and `error` are JSONL-only and dropped from the persisted
trajectory, while `approval`, `compaction`, and `interrupt` annotations are
preserved by the runner and recorded under `extra.approval_events`,
`extra.compaction_events`, and `extra.interrupt_events` (as implemented by
`collectTrajectoryEvent`); remove the blanket claim that those three annotation
types are dropped and add brief note pointing readers to
`collectTrajectoryEvent` for exact field names.
In `@packages/headless/src/trajectory-collector.ts`:
- Around line 75-79: The TrajectoryJson.extra type is too permissive because the
Record<string, unknown> lets arbitrary lifecycle buckets bypass the ATIF
persistence contract; remove the open record and restrict extra to only the
known persisted buckets (approval_events?: ApprovalEvent[]; compaction_events?:
CompactionEvent[]; interrupt_events?: InterruptEvent[]), then update/verify
TrajectoryCollector.finalize() and the TrajectoryCollector implementation to
only write those known buckets and require adding a new collector method
whenever a new lifecycle bucket is introduced.
- Around line 16-20: The doc comment is inconsistent about which lifecycle
events are persisted; update the wording in trajectory-collector.ts to clearly
state that approval/compaction/interrupt lifecycle annotations are persisted
under extra.approval_events, extra.compaction_events and extra.interrupt_events
on the trajectory, while events like turn-start and error are not persisted as
step sources and live only in the JSONL stream; also add the guideline note that
adding a new persisted lifecycle bucket requires implementing a corresponding
collector method.
In `@packages/headless/src/types.ts`:
- Around line 4-13: The comment says the docblock incorrectly states that
lifecycle annotations (`approval`, `compaction`, `interrupt`, `turn-start`) are
JSONL-only; in reality `approval`, `compaction`, and `interrupt` are persisted
to ATIF via TrajectoryCollector as extra.approval_events,
extra.compaction_events, and extra.interrupt_events. Update the comment in
packages/headless/src/types.ts to restrict "JSONL-only" to only transient
signals (e.g., `turn-start`) and explicitly note that `approval`, `compaction`,
and `interrupt` are forwarded by TrajectoryCollector into ATIF `extra.*` fields
while `turn-start` remains transient.
In `@packages/tui/src/agent-tui.ts`:
- Around line 1041-1064: Capture the current history revision at the start of
measureUsageIfAvailable (e.g., const startRev =
config.messageHistory.getRevision()) and after awaiting
config.measureUsage(messages) verify the revision is unchanged before calling
config.messageHistory.updateActualUsage; if startRev !==
config.messageHistory.getRevision() treat the probe as stale and return false.
Do this in the measureUsageIfAvailable path that updates usage (the block using
usageProbeGeneration and calling config.messageHistory.updateActualUsage) and
apply the same revision-capture-and-check to the other similar spot (the second
occurrence around where updateActualUsage is called) so background probes cannot
overwrite newer history.
---
Nitpick comments:
In `@packages/headless/src/__tests__/atif-events.test.ts`:
- Around line 586-687: The tests for TrajectoryCollector (finalize()) correctly
validate total_cost_usd aggregation and null behavior, and preservation of
optional ATIF-v1.4 fields; no code change required—leave
TrajectoryCollector.finalize, toAtifStep, and the ATIF_STEP_ALLOWED_FIELDS logic
in trajectory-collector.ts as-is, but if you later need stricter whitelist
checks for nested metrics keys add a separate Set for allowed metric keys and
update toAtifStep to validate metrics against that Set.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 983e34d5-0965-43c7-81a5-3cfa57925765
📒 Files selected for processing (18)
.changeset/prompt-processing-indicators.mdpackages/cea/benchmark/AGENTS.mdpackages/cea/benchmark/scorer.pypackages/cea/benchmark/test_trajectory.pypackages/cea/src/entrypoints/main.tspackages/harness/src/index.tspackages/harness/src/loop.test.tspackages/harness/src/loop.tspackages/harness/src/types.tspackages/headless/AGENTS.mdpackages/headless/README.mdpackages/headless/src/__tests__/atif-events.test.tspackages/headless/src/runner.test.tspackages/headless/src/runner.tspackages/headless/src/trajectory-collector.tspackages/headless/src/types.tspackages/tui/src/agent-tui.tspackages/tui/src/stream-handlers.ts
| let usageProbeGeneration = 0; | ||
|
|
||
| const measureUsageIfAvailable = async ( | ||
| messages: ModelMessage[] | ||
| ): Promise<boolean> => { | ||
| if (!config.measureUsage) { | ||
| return false; | ||
| } | ||
|
|
||
| usageProbeGeneration += 1; | ||
| const thisGeneration = usageProbeGeneration; | ||
|
|
||
| const measured = normalizeUsageMeasurement( | ||
| await config.measureUsage(messages) | ||
| ); | ||
| if (!measured) { | ||
| return false; | ||
| } | ||
|
|
||
| if (thisGeneration !== usageProbeGeneration) { | ||
| return false; | ||
| } | ||
|
|
||
| config.messageHistory.updateActualUsage({ |
There was a problem hiding this comment.
startup probe 결과를 history revision으로도 무효화해 주세요.
usageProbeGeneration은 “더 새 probe가 시작된 경우”만 막습니다. 사용자가 입력해 messageHistory가 바뀐 직후, 다음 measureUsageIfAvailable(messagesForLLM)가 시작되기 전에 background measureUsageIfAvailable([])가 resolve되면 빈 메시지 기준 usage가 현재 history에 적용될 수 있습니다. getRevision()을 캡처해 probe 시작/완료 사이 history 변경도 stale 처리해 주세요.
🛡️ 제안 수정
usageProbeGeneration += 1;
const thisGeneration = usageProbeGeneration;
+ const revisionAtStart = config.messageHistory.getRevision?.();
const measured = normalizeUsageMeasurement(
await config.measureUsage(messages)
);
@@
- if (thisGeneration !== usageProbeGeneration) {
+ if (
+ thisGeneration !== usageProbeGeneration ||
+ (revisionAtStart !== undefined &&
+ config.messageHistory.getRevision?.() !== revisionAtStart)
+ ) {
return false;
}Also applies to: 1081-1083
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/tui/src/agent-tui.ts` around lines 1041 - 1064, Capture the current
history revision at the start of measureUsageIfAvailable (e.g., const startRev =
config.messageHistory.getRevision()) and after awaiting
config.measureUsage(messages) verify the revision is unchanged before calling
config.messageHistory.updateActualUsage; if startRev !==
config.messageHistory.getRevision() treat the probe as stale and return false.
Do this in the measureUsageIfAvailable path that updates usage (the block using
usageProbeGeneration and calling config.messageHistory.updateActualUsage) and
apply the same revision-capture-and-check to the other similar spot (the second
occurrence around where updateActualUsage is called) so background probes cannot
overwrite newer history.
There was a problem hiding this comment.
3 issues found across 18 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/tui/src/agent-tui.ts">
<violation number="1" location="packages/tui/src/agent-tui.ts:1537">
P2: `Processing...` loader is shown before the guarded `try/finally`, so setup errors can leave a stuck spinner.</violation>
</file>
<file name="packages/headless/README.md">
<violation number="1" location="packages/headless/README.md:172">
P3: Clarify that only transient annotations like `turn-start` are dropped; approval/compaction/interrupt are still persisted under `extra.*` in `trajectory.json`.</violation>
</file>
<file name="packages/headless/AGENTS.md">
<violation number="1" location="packages/headless/AGENTS.md:21">
P3: Clarify that only `turn-start` is dropped; `approval`, `compaction`, and `interrupt` are persisted under `trajectory.extra.*`.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Cycle-1 triggered a fresh round of AI reviews. Consolidating every
actionable finding into a single commit:
P1 (Oracle): runHeadless could persist an invalid zero-step ATIF
trajectory when the stream failed before any step was emitted. Harbor's
own validator rejects steps: []. TrajectoryCollector.writeTo now returns
boolean and skips disk writes for zero-step runs instead of producing
an invalid file.
P2 (Codex + Cubic): showLoader('Processing...') lived outside the
runSingleStreamTurn try/finally, so a thrown prepareMessages /
onBeforeTurn / usage probe / compaction check would leave the spinner
stuck on screen. Moved the call into the try block so clearStatus() in
finally cleans it up on every exit path.
P2 (CodeRabbit): the startup usage-probe guard only rejected results from
stale generations, not results computed against a different history.
Added a messageHistory.getRevision() capture-and-compare so a
background probe resolving after user input no longer overwrites the
new turn's real usage with empty-message baseline.
P2 (CodeRabbit): TrajectoryJson.extra was typed as
'{approval_events?, compaction_events?, interrupt_events?} & Record<string, unknown>',
letting new lifecycle buckets type-check without a matching
TrajectoryCollector method. Dropped the open-record intersection so the
ATIF persistence contract is enforced structurally.
P2 (CodeRabbit): test_trajectory.py's _is_real_number accepted NaN,
Infinity, and -Infinity (all producible via json.loads on non-strict
input). Added an explicit math.isfinite() check.
P3 (CodeRabbit + Cubic): documentation drift — several docstrings and
markdown files claimed 'approval/compaction/interrupt are JSONL-only
and dropped from trajectory.json'. That is wrong: they are persisted
under trajectory.extra.* by the collector. Only turn-start and error
are truly transient. Corrected in packages/headless/AGENTS.md,
packages/headless/README.md, packages/headless/src/types.ts,
packages/headless/src/trajectory-collector.ts module JSDoc, and the
root AGENTS.md.
Regression test added: runner.test.ts now asserts that zero-step runs
with atifOutputPath set do NOT produce a file (does not write an invalid
zero-step trajectory when the stream fails before any step).
Verified: typecheck (6/6), ultracite check (277 files clean),
tests (1236 pass; headless +1 new = 65), build (6/6).
|
@coderabbitai review |
|
@codex review |
|
@cubic-dev-ai review |
✅ Actions performedReview triggered.
|
@minpeter I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 45a810f4dd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (this.steps.length === 0) { | ||
| return false; |
There was a problem hiding this comment.
Clear stale trajectory output on zero-step runs
When writeTo returns early on this.steps.length === 0, it leaves any existing outputPath file untouched; with a reused path (e.g., CEA defaults to trajectory.json), a zero-step run (such as an early abort/timeout before any step is emitted) can silently preserve and expose the previous run's trajectory as if it were current output. This is a data-integrity regression introduced by the new skip-write behavior; the zero-step path should delete/truncate the target file or force callers to handle cleanup when false is returned.
Useful? React with 👍 / 👎.
Cycle 2: Cubic reported 'No issues found across 19 files'. Oracle's second audit surfaced three genuine items the bots missed: 1. Headless probe race (Oracle P2): measureUsageIfAvailable in the headless runner lacked the generation + revision guard that the TUI already had. A slow background probe could overwrite fresher post-compaction usage. Mirrored the TUI pattern using messageHistory.getRevision?.() with optional chaining (the headless message history interface declares getRevision as optional). 2. ATIF step source contract drift (Oracle P2): trajectory-collector.ts permits steps[*].source = 'user' | 'agent' | 'system' (Harbor ATIF v1.4 allows all three; system steps support observations since v1.2), but the bundled Python validator rejected 'system' and the benchmark docs documented only 'user | agent'. Aligned all three surfaces. 3. Root README.md drift (Oracle P3): headless event list omitted turn-start and did not point at Harbor's ATIF-v1.4 schema. Updated to match the current package docs. Verified: typecheck (6/6), ultracite check (277 files clean), 1236 tests pass, build (6/6). No new tests required — the headless probe guard is covered by the same invariants the TUI guard already tests.
|
@coderabbitai review |
|
@codex review |
|
@cubic-dev-ai review |
✅ Actions performedReview triggered.
|
@minpeter I have started the AI code review. It will take a few minutes to complete. |
…ype (cycle 3) Cycle 3 Oracle audit flagged exactly one remaining issue: the changeset for PR #117 said in its 'features' bullet (line 14) that TrajectoryJson.extra was typed as an open record, then said in its 'cycle 1 follow-ups' bullet (line 30) that it was later narrowed back to the three canonical lifecycle buckets. The shipped code matches the NARROWED version, so the earlier bullet was stale. Fixed by rewriting line 14 to reflect the final, shipped state: extra is a closed record of exactly approval_events / compaction_events / interrupt_events, and new lifecycle types must extend the interface explicitly. No runtime changes. Verified: ultracite check clean, 1236 tests pass, build (6/6 full turbo).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8160c46acb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (this.steps.length === 0) { | ||
| return false; |
There was a problem hiding this comment.
Clear stale ATIF output when skipping zero-step writes
Returning early when this.steps.length === 0 leaves any pre-existing file at outputPath untouched. If a run fails before emitting a step (for example, agent.stream() rejects before any user/agent step is recorded) and reuses the same atifOutputPath, downstream tooling will read stale trajectory data from an older run and silently report incorrect metrics. Ensure the path is removed or overwritten when zero-step persistence is skipped.
Useful? React with 👍 / 👎.
| const runBackgroundStartupProbe = (): void => { | ||
| measureUsageIfAvailable([]).then(ignore, ignore); |
There was a problem hiding this comment.
Cancel background startup probe on TUI teardown
The startup usage probe is launched fire-and-forget and never tracked or canceled. In sessions where measureUsage performs network I/O, exiting the TUI before that probe resolves can keep the Node process alive until the probe completes, and the late result can still mutate history/header state after shutdown. Tie this probe to teardown (or an abort signal) so early exits are deterministic.
Useful? React with 👍 / 👎.
|
You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment |
Summary
Users were seeing a frozen UI during two separate silent gaps in the agent lifecycle. This change surfaces both with a loading indicator and makes the startup probe non-blocking.
Changes by package
`@ai-sdk-tool/harness` (patch)
`@ai-sdk-tool/tui` (patch)
`@ai-sdk-tool/headless` (patch)
UX before / after
Before (per-turn prep):
```
[Enter] → [0–500 ms silent freeze] → 'Working...' → first token
└── GAP
```
After (per-turn prep):
```
[Enter] → 'Processing...' → 'Working...' → cleared on first visible part
└── prep (onBeforeTurn, measureUsage, compaction)
└── LLM request in flight (TTFB)
```
Before (first launch):
```
tui.start() → onSetup() → await measureUsage([]) ★ 200ms–3s blocking ★ → editor accepts input
```
After (first launch):
```
tui.start() → onSetup() → runBackgroundStartupProbe() ★ returns immediately ★ → editor accepts input
└── background: footer updates when probe resolves
```
Compatibility
Verification
Changeset
`prompt-processing-indicators.md` — patch bump for `@ai-sdk-tool/harness`, `@ai-sdk-tool/tui`, `@ai-sdk-tool/headless`. No consumer code breaks; purely additive public API.
Summary by cubic
Adds clear “Processing…” → “Working…” indicators and makes TUI startup input non‑blocking. Aligns persisted trajectories with Harbor ATIF‑v1.4 and tightens guards, event ordering, and validation.
New Features
@ai-sdk-tool/tui: Two‑stage loader; non‑blocking startup usage probe; newonStreamStart; treatstext-startas visible.@ai-sdk-tool/harness: NewLoopHooks.onStreamStartandonFirstStreamPart(part, context); re‑exportsTextStreamPart.@ai-sdk-tool/headless: Emits transientturn-startandonStreamStart; persists ATIF‑v1.4 withlogprobs,prompt_token_ids,completion_token_ids, andfinal_metrics.total_cost_usd; docs clarify JSONL vs ATIF; CEA CLI/validator updated to ATIF‑v1.4 and calls Harbor when available; allowssteps[*].source=user|agent|system;TrajectoryJson.extrais a closed record ofapproval_events/compaction_events/interrupt_events.Bug Fixes
turn-startfires once per logical turn and only afteragent.stream()succeeds; never persisted; zero‑step trajectories are not written; usage probe guarded by generation + history revision.showLoader("Processing...")moved into the turn try block; blocking compaction shows “Compacting…” and restores correctly; startup probe is fire‑and‑forget with generation + revision guards;updateHeader()restored afteronSetup.onStreamStart/onFirstStreamParterrors are logged and swallowed; restoredLoopHooks.onToolCalltype;onFirstStreamPartnow receives(part, context).booland non‑finite numbers; root docs listturn-startand link Harbor ATIF‑v1.4; changeset text corrected to reflect the closedTrajectoryJson.extracontract.Written for commit 8df785b. Summary will update on new commits.
Summary by CodeRabbit
릴리스 노트
새로운 기능
개선사항