Skip to content

feat: surface prompt-processing state and unblock TUI startup#117

Merged
minpeter merged 12 commits into
mainfrom
feat/prompt-processing-indicators
Apr 21, 2026
Merged

feat: surface prompt-processing state and unblock TUI startup#117
minpeter merged 12 commits into
mainfrom
feat/prompt-processing-indicators

Conversation

@minpeter

@minpeter minpeter commented Apr 21, 2026

Copy link
Copy Markdown
Owner

Summary

Users were seeing a frozen UI during two separate silent gaps in the agent lifecycle. This change surfaces both with a loading indicator and makes the startup probe non-blocking.

  • Gap 1 — per-turn prompt processing (user hits Enter → LLM stream starts): Local prep (`onBeforeTurn`, `measureUsage`, compaction) had no spinner at all. Now shows `Processing...` during prep and transitions to `Working...` once the LLM request is in flight.
  • Gap 2 — startup blocking probe (first launch blocks input for 200 ms–3 s): The initial `measureUsageIfAvailable([])` call that measures system-prompt + tool-schema token overhead now runs fire-and-forget. The context-usage footer starts on the estimated value and silently upgrades to the real one once the probe resolves.

Changes by package

`@ai-sdk-tool/harness` (patch)

  • New `LoopHooks.onStreamStart` — fires after `agent.stream()` is dispatched, before `fullStream` iteration begins. Closest hook to "request sent, waiting for first chunk".
  • New `LoopHooks.onFirstStreamPart` — fires exactly once per iteration when any part arrives. Useful for clearing prompt-processing indicators.

`@ai-sdk-tool/tui` (patch)

  • Two-stage foreground loader: `Processing...` → `Working...` → cleared on first visible stream part.
  • `AgentTUIConfig.onStreamStart` callback exposed so consumers can hook the boundary.
  • Startup `measureUsage([])` probe is now fire-and-forget; editor accepts input immediately on launch.
  • New top-level `ignore` helper + `runBackgroundStartupProbe` to keep the fire-and-forget semantics self-documenting (no empty catch blocks).

`@ai-sdk-tool/headless` (patch)

  • New `TurnStartEvent` (`type: "turn-start"`) lifecycle annotation emitted between `agent.stream()` dispatch and the first chunk. Carries no `step_id` (consistent with other lifecycle events).
  • New `HeadlessRunnerConfig.onStreamStart` callback mirrors the TUI hook.
  • ATIF consumers unaffected: `TrajectoryCollector` routes unknown event types to `default` and drops `turn-start` from `trajectory.json` (transient UX signal).
  • `packages/headless/AGENTS.md` and `README.md` updated with the new event type, shape, and TypeScript export.

UX before / after

Before (per-turn prep):
```
[Enter] → [0–500 ms silent freeze] → 'Working...' → first token
└── GAP
```

After (per-turn prep):
```
[Enter] → 'Processing...' → 'Working...' → cleared on first visible part
└── prep (onBeforeTurn, measureUsage, compaction)
└── LLM request in flight (TTFB)
```

Before (first launch):
```
tui.start() → onSetup() → await measureUsage([]) ★ 200ms–3s blocking ★ → editor accepts input
```

After (first launch):
```
tui.start() → onSetup() → runBackgroundStartupProbe() ★ returns immediately ★ → editor accepts input
└── background: footer updates when probe resolves
```

Compatibility

  • All new hooks/fields are optional; existing consumers compile and run unchanged.
  • `TrajectoryEvent` is a union extension — existing `switch` consumers hit their `default` branch and ignore `turn-start`.
  • `trajectory.json` output is identical (`turn-start` is JSONL-only, not persisted).
  • `session-tui` / `session-headless` adapters pass the new `onStreamStart` through automatically via their existing `Omit` spread pattern.

Verification

  • `pnpm run typecheck` — 6/6 packages pass
  • `pnpm run check` (ultracite) — clean
  • `pnpm run test` — all 566 tests pass; headless `runner.test.ts` output now includes `turn-start` events
  • `pnpm run build` — 6/6 packages succeed

Changeset

`prompt-processing-indicators.md` — patch bump for `@ai-sdk-tool/harness`, `@ai-sdk-tool/tui`, `@ai-sdk-tool/headless`. No consumer code breaks; purely additive public API.


Summary by cubic

Adds clear “Processing…” → “Working…” indicators and makes TUI startup input non‑blocking. Aligns persisted trajectories with Harbor ATIF‑v1.4 and tightens guards, event ordering, and validation.

  • New Features

    • @ai-sdk-tool/tui: Two‑stage loader; non‑blocking startup usage probe; new onStreamStart; treats text-start as visible.
    • @ai-sdk-tool/harness: New LoopHooks.onStreamStart and onFirstStreamPart(part, context); re‑exports TextStreamPart.
    • @ai-sdk-tool/headless: Emits transient turn-start and onStreamStart; persists ATIF‑v1.4 with logprobs, prompt_token_ids, completion_token_ids, and final_metrics.total_cost_usd; docs clarify JSONL vs ATIF; CEA CLI/validator updated to ATIF‑v1.4 and calls Harbor when available; allows steps[*].source = user | agent | system; TrajectoryJson.extra is a closed record of approval_events/compaction_events/interrupt_events.
  • Bug Fixes

    • Headless: turn-start fires once per logical turn and only after agent.stream() succeeds; never persisted; zero‑step trajectories are not written; usage probe guarded by generation + history revision.
    • TUI: showLoader("Processing...") moved into the turn try block; blocking compaction shows “Compacting…” and restores correctly; startup probe is fire‑and‑forget with generation + revision guards; updateHeader() restored after onSetup.
    • Hooks: onStreamStart/onFirstStreamPart errors are logged and swallowed; restored LoopHooks.onToolCall type; onFirstStreamPart now receives (part, context).
    • Validator/docs: Rejects bool and non‑finite numbers; root docs list turn-start and link Harbor ATIF‑v1.4; changeset text corrected to reflect the closed TrajectoryJson.extra contract.

Written for commit 8df785b. Summary will update on new commits.

Summary by CodeRabbit

릴리스 노트

  • 새로운 기능

    • 프롬프트 처리 중 상태 표시기 추가 (Processing... → Working... 전환)
    • 스트림 시작 시 콜백 지원
  • 개선사항

    • 비용 추적 및 토큰 메트릭 집계 향상
    • 차단 작업 중 로더 상태 복원 기능 추가
    • 궤적 검증 강화 및 메타데이터 정확성 개선

Previously users saw a frozen UI during two separate silent gaps:
1. Between hitting Enter and the LLM stream starting — during local prep
   (onBeforeTurn, measureUsage, compaction) there was no spinner at all.
2. On first launch the startup token probe blocked the input loop for
   200ms–3s while it measured system-prompt + tool-schema token overhead.

This change:
- Adds onStreamStart and onFirstStreamPart hooks to the harness LoopHooks
  so any agent runtime can signal the exact prompt-processing window.
- In the TUI, shows a 'Processing...' loader during turn preparation and
  transitions to 'Working...' once the LLM request is in flight (cleared
  on the first visible stream part).
- In headless, emits a 'turn-start' lifecycle annotation and a matching
  onStreamStart callback; extends the TrajectoryEvent union accordingly.
- Runs the TUI startup measureUsage probe fire-and-forget so the editor
  accepts input immediately; the context-usage footer starts on the
  estimated value and quietly upgrades to the real one when ready.

Documents the new turn-start event in the headless AGENTS.md / README.
@coderabbitai

coderabbitai Bot commented Apr 21, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on this repository. To trigger a review, include @crb review in the PR description. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ffa0e2af-d65a-4c20-9b7f-b4d9fdacf745

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

워크스루

여러 패키지에 걸쳐 좌표화된 패치를 도입하여 이전에 정지된 것처럼 보였던 프롬프트 처리/턴 수명주기 동작을 표면화하고 수정합니다. onStreamStartonFirstStreamPart 콜백을 통해 스트림 시작 신호를 추가하고, turn-start 생명주기 주석을 내보내며, ATIF 스키마를 v1.6에서 v1.4로 마이그레이션하고, TUI에서 처리 상태 표시를 개선하고, 옵션 메트릭 필드를 확장합니다.

변경사항

코호트 / 파일(들) 요약
ATIF 스키마 마이그레이션
packages/headless/src/trajectory-collector.ts, packages/cea/src/entrypoints/main.ts, packages/cea/benchmark/scorer.py, packages/cea/benchmark/test_trajectory.py
스키마 버전을 ATIF v1.6에서 v1.4(Harbor 호환)로 업데이트하고, total_cost_usd 메트릭 필드를 추가하며, 검증 로직을 강화하여 bool 타입을 숫자 필드에서 거부합니다.
생명주기 이벤트 시스템
packages/headless/src/types.ts, packages/headless/src/runner.ts, packages/headless/src/__tests__/atif-events.test.ts, packages/headless/src/runner.test.ts
turn-start 이벤트 타입을 도입하고, agent.stream() 호출 후 정확히 한 번 내보내며, 이벤트를 JSONL 스트림에만 유지하고 지속된 trajectory.json에서는 제외합니다.
하네스 옵저버 훅
packages/harness/src/types.ts, packages/harness/src/loop.ts, packages/harness/src/index.ts, packages/harness/src/loop.test.ts
onStreamStartonFirstStreamPart 콜백을 추가하고 오류 격리 처리를 구현하며, TextStreamPart 타입을 재내보냅니다.
TUI 처리 표시 개선
packages/tui/src/agent-tui.ts, packages/tui/src/stream-handlers.ts
스트림 시작 전후로 로더를 "Processing..."에서 "Working..."으로 전환하고, 사용량 프로빙을 비차단으로 만들고, 압축 중 상태를 관리하며, text-start를 가시적으로 처리합니다.
문서 및 검증 개선
packages/headless/AGENTS.md, packages/headless/README.md, packages/cea/benchmark/AGENTS.md
ATIF v1.4 규정 요구사항, turn-start 이벤트 이동 경로, 스트림 프로토콜을 명확히 하는 문서를 업데이트합니다.

시퀀스 다이어그램

sequenceDiagram
    participant Client
    participant Harness as Harness<br/>(runAgentLoop)
    participant Stream as Agent<br/>Stream
    participant Observer as Observer<br/>(onStreamStart,<br/>onFirstStreamPart)
    participant ProcessLoop as Stream<br/>Processing Loop

    Client->>Harness: runAgentLoop(...)
    Harness->>Stream: agent.stream()
    Stream-->>Harness: stream created
    Harness->>Observer: onStreamStart(context)
    Observer-->>Harness: [hook completes]
    Harness->>ProcessLoop: start fullStream iteration
    ProcessLoop->>Stream: await first chunk
    Stream-->>ProcessLoop: part[0]
    ProcessLoop->>Observer: onFirstStreamPart(part[0], context)
    Observer-->>ProcessLoop: [hook completes]
    ProcessLoop->>ProcessLoop: continue iteration
    ProcessLoop->>Stream: await next chunk
    Stream-->>ProcessLoop: part[1..n]
    ProcessLoop->>ProcessLoop: process all parts
    ProcessLoop-->>Harness: iteration complete
    Harness-->>Client: loop result
Loading
sequenceDiagram
    participant Client
    participant Headless as Headless<br/>(runHeadless)
    participant Collector as Trajectory<br/>Collector
    participant Stream as Agent<br/>Stream
    participant JSONL as JSONL<br/>Emitter
    participant File as trajectory.json<br/>(ATIF v1.4)

    Client->>Headless: runHeadless(config)
    Headless->>Stream: agent.stream()
    Stream-->>Headless: stream created
    Headless->>JSONL: emit TurnStartEvent
    JSONL->>Collector: collect turn-start
    Collector->>Collector: [DROP - transient]
    JSONL-->>Client: stdout JSONL
    Headless->>Headless: processStream()
    Headless->>JSONL: emit step events
    JSONL->>Collector: collect steps
    Collector->>Collector: [PERSIST]
    Headless->>Collector: finalize()
    Collector->>Collector: aggregate metrics<br/>(total_cost_usd)
    Collector->>File: write trajectory.json<br/>(schema_version: v1.4)
    File-->>Client: persisted ATIF v1.4
Loading

예상 코드 리뷰 노력

🎯 4 (복잡) | ⏱️ ~45분

🐰 턴이 시작되는 순간을 캐치해,
스트림이 흐르기 전에 신호를 보내,
비용은 합산되고 v1.4로 춤을 춘다네,
옵저버들은 오류도 잘 견디며,
처리 중인 우리 에이전트, 멋진 여행이야!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed 제목은 PR의 주요 변경사항을 정확하게 반영하고 있습니다. 'prompt-processing 상태 표면화'와 'TUI 시작 차단 해제'라는 두 가지 핵심 목표가 간결하고 명확하게 표현되어 있습니다.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/prompt-processing-indicators

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces "prompt processing" indicators across the harness, TUI, and headless packages to improve user feedback during LLM request latency. It adds onStreamStart and onFirstStreamPart hooks to the agent loop and a new turn-start event for headless mode. In the TUI, the startup token probe is now non-blocking. Feedback includes addressing a race condition in the TUI's background startup probe and ensuring that the onStreamStart hook and related UI states are triggered after the agent stream is dispatched for consistency and accurate latency signaling.

};

const runBackgroundStartupProbe = (): void => {
measureUsageIfAvailable([]).then(ignore, ignore);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Running measureUsageIfAvailable([]) as a fire-and-forget background task at startup introduces a race condition. If the user quickly starts a turn and that turn's usage probe (which is awaited) finishes before this background probe, the background probe will eventually resolve and overwrite the messageHistory with stale usage data for an empty message list. This would result in an incorrect token count being displayed in the UI footer.

Consider adding a tracking mechanism (e.g., a sequential ID or a timestamp) to measureUsageIfAvailable to ensure that only the result of the most recently requested probe is applied to the state.

Comment thread packages/headless/src/runner.ts Outdated
Comment on lines 673 to 682
emitAndCollect({
type: "turn-start",
phase,
timestamp: new Date().toISOString(),
});
await config.onStreamStart?.(phase);

const streamPromise = Promise.resolve(
config.agent.stream(streamOptions)
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The onStreamStart hook and turn-start event are currently emitted before the agent stream is dispatched. According to the LoopHooks documentation and the implementation in runAgentLoop, this hook should fire immediately after agent.stream() is invoked. This ensures that the "Working..." or "turn-start" state accurately reflects the period where the request is in flight, and avoids triggering the hook if agent.stream() itself throws an error during initialization.

Suggested change
emitAndCollect({
type: "turn-start",
phase,
timestamp: new Date().toISOString(),
});
await config.onStreamStart?.(phase);
const streamPromise = Promise.resolve(
config.agent.stream(streamOptions)
);
const streamPromise = Promise.resolve(
config.agent.stream(streamOptions)
);
emitAndCollect({
type: "turn-start",
phase,
timestamp: new Date().toISOString(),
});
await config.onStreamStart?.(phase);

Comment thread packages/tui/src/agent-tui.ts Outdated
Comment on lines +1523 to +1524
showLoader("Working...");
await config.onStreamStart?.(phase);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The onStreamStart hook and the "Working..." loader state are triggered before the agent stream is dispatched. For consistency with the harness loop and to accurately signal the start of the LLM request latency gap (TTFB), these should be moved after the config.agent.stream call. This also prevents the "Working..." state from being entered if the stream creation itself fails.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94712f580d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

};

const runBackgroundStartupProbe = (): void => {
measureUsageIfAvailable([]).then(ignore, ignore);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prevent stale startup probe from clobbering usage state

Launching measureUsageIfAvailable([]) in the background creates a race with the first real turn’s measureUsageIfAvailable(messages) call, and both paths write to messageHistory.updateActualUsage(...) without any revision guard. If the startup probe resolves last, it can overwrite newer token usage with the empty-message baseline, which underestimates context pressure and can mis-size maxOutputTokens/compaction decisions until another measurement occurs.

Useful? React with 👍 / 👎.

Comment thread packages/headless/src/runner.ts Outdated
Comment on lines +674 to +678
type: "turn-start",
phase,
timestamp: new Date().toISOString(),
});
await config.onStreamStart?.(phase);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Emit turn-start after dispatching the stream request

turn-start is emitted (and onStreamStart awaited) before config.agent.stream(streamOptions) is invoked, so failures during stream creation still produce a false "request started" lifecycle event. This breaks the event contract documented as "after dispatch" and can skew downstream latency/phase metrics for consumers that treat turn-start as the network dispatch boundary.

Useful? React with 👍 / 👎.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 8 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/headless/src/runner.ts">

<violation number="1" location="packages/headless/src/runner.ts:673">
P2: `turn-start` is emitted before `agent.stream()` dispatch, so lifecycle events can claim a stream started when it hasn't.</violation>
</file>

<file name="packages/headless/README.md">

<violation number="1" location="packages/headless/README.md:185">
P3: Document the actual sequencing: `turn-start` is emitted before `agent.stream()` is called, not after it is dispatched.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread packages/headless/src/runner.ts Outdated
Comment thread packages/headless/README.md
Post-implementation audit surfaced four issues and two gaps; all are
addressed here to keep the PR self-contained.

- BUG A (headless): turn-start was re-emitted on every retry, so a
  single logical turn could produce 2-4 turn-start events (overflow
  retry + up to 3 no-output retries). Added a hasEmittedTurnStart
  flag in runSingleTurn scope so the event fires at most once per
  logical turn; flag resets naturally on the next turn.
- BUG B (TUI): during a blocking compaction the foreground loader
  stayed on 'Processing...' while the real wait was the compaction
  LLM call. The onBlockingChange callback now swaps the foreground
  label to 'Compacting...' on block entry and restores the previous
  label on block exit, so users see the real reason for the wait.
- BUG C (TUI): text-start stream parts were not counted as visible,
  leaving the loader spinning after the empty AssistantStreamView
  had already mounted. text-start is now visible; reasoning-start
  follows the existing showReasoning flag.
- BUG D (harness): documented that LoopHooks.onStreamStart fires
  only under runAgentLoop; the TUI has its own independent hook
  on AgentTUIConfig with a different signature.
- GAP 1 (tests): added retry-path assertions that turn-start fires
  exactly once, plus a new test that confirms normal-path ordering
  (metadata -> user step -> turn-start -> agent step) and an
  intermediate-step phase test for tool-continuation turns.
- GAP 1b (tests): added a test that turn-start events never appear
  in the persisted trajectory.json even though they stream on JSONL.
- GAP 2 (docs): updated packages/cea/benchmark/AGENTS.md event
  table and flow diagram to include turn-start and note that it is
  not persisted.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87450f1082

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 1853 to 1856
await config.onSetup?.();
await measureUsageIfAvailable([]);
updateHeader();

runBackgroundStartupProbe();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Refresh header after onSetup completes

updateHeader() now runs only before onSetup, and this block immediately starts the background probe without a post-setup refresh. If onSetup asynchronously initializes header/footer state (for example, setting a resolved model subtitle), the rendered header stays stale until an unrelated later render (such as a usage probe update or first turn), so startup metadata can be wrong for the entire idle session.

Useful? React with 👍 / 👎.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 7 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/tui/src/agent-tui.ts">

<violation number="1" location="packages/tui/src/agent-tui.ts:931">
P2: When blocking compaction ends you always restore the stashed foreground loader, even if `clearStatus()` already removed it (e.g., once the first stream part arrived). That resurrects a stale “Processing…” spinner and makes the UI look frozen again. Guard the restoration (and clear the stash) when the loader was cleared during the block.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread packages/tui/src/agent-tui.ts
The persisted trajectory was tagged 'ATIF-v1.6', which does not exist
on Harbor's public spec. The current Harbor ATIF version is v1.4
(see https://www.harborframework.com/docs/agents/trajectory-format):
v1.0, v1.1, v1.2, v1.3, v1.4 — v1.5 and v1.6 were never released.

The 'v1.6' label was an internal bump accompanying a redesign of the
JSONL event types on 2026-04-03; it conflated two distinct surfaces:

- The ATIF trajectory that TrajectoryCollector writes to disk, which
  must follow Harbor's published schema (v1.4).
- The internal stdout JSONL protocol used by the headless runner,
  which carries lifecycle annotations (approval, compaction,
  interrupt, turn-start) that ATIF does not define.

This commit:

- Sets schema_version to 'ATIF-v1.4' in TrajectoryCollector (both the
  TypeScript type literal and the runtime value).
- Updates the Python validator (test_trajectory.py), Python scorer,
  CLI help text, and test assertions to expect 'ATIF-v1.4'.
- Rewrites packages/headless/AGENTS.md, README.md, and
  packages/cea/benchmark/AGENTS.md to separate the ATIF persisted
  format from the internal JSONL streaming protocol, with a pointer
  to Harbor's spec page.
- Adds a JSDoc header on TrajectoryEvent types explaining the split.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 49cd7261de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/tui/src/agent-tui.ts Outdated
Comment on lines +921 to +923
if (foregroundStatusMessage !== null) {
foregroundStatusBeforeBlocking = foregroundStatusMessage;
showLoader("Compacting...");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Capture prior spinner text only once per blocking compaction

onBlockingChange can emit multiple blocking: true stages for a single compaction (startingpruningcompacting from CompactionOrchestrator.blockAtHardLimit/handleOverflow), but this block overwrites foregroundStatusBeforeBlocking every time. After the first stage, the foreground text is already "Compacting...", so later stages replace the original "Processing..."/"Working..." snapshot, and the unblock path restores the wrong label. This breaks the intended status restoration and leaves users with stale compaction wording after blocking ends.

Useful? React with 👍 / 👎.

…validator

After correcting the schema_version label to 'ATIF-v1.4', audit against
the official spec (https://www.harborframework.com/docs/agents/trajectory-format)
surfaced three remaining gaps. This commit closes them.

- StepMetrics now exposes the full v1.4 optional surface:
  - logprobs: number[]
  - prompt_token_ids: number[]    (added in v1.4)
  - completion_token_ids: number[] (added in v1.3)
  These remain undefined unless the provider populates them, so
  existing output is bit-for-bit identical for callers that don't
  set the new fields.

- TrajectoryJson.final_metrics now aggregates total_cost_usd across
  step metrics with the same null-when-absent semantics as the other
  token totals. The ATIF compliance test suite already expected this
  field in ATIF_FINAL_METRICS_ALLOWED_FIELDS; the emission side just
  caught up to the test.

- TrajectoryJson.extra is typed as the approval/compaction/interrupt
  container intersected with Record<string, unknown> so downstream
  additions to extra.* do not require breaking type changes.

- packages/cea/benchmark/test_trajectory.py now:
  - Calls harbor.utils.trajectory_validator.TrajectoryValidator when
    the harbor package is importable (auto-skips when Harbor is not
    installed so local developer workflows keep working).
  - Enforces per-step metric shapes: numeric fields (prompt_tokens,
    completion_tokens, cached_tokens, cost_usd) must be numbers;
    id fields (logprobs, prompt_token_ids, completion_token_ids)
    must be arrays.
  - Prints total_cost_usd in the summary block and reports whether
    the Harbor validator was used or skipped.

- New tests:
  - finalize() aggregates total_cost_usd across step metrics
  - finalize() returns null total_cost_usd when no step reports cost
  - finalize() preserves all v1.4 optional metric fields (logprobs,
    prompt_token_ids, completion_token_ids)

Verified: typecheck, ultracite check, full vitest (569 tests), and
turbo build all pass.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52acdd4be9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/tui/src/agent-tui.ts Outdated
Comment on lines 1536 to 1538
await config.onStreamStart?.(phase);

const stream = await config.agent.stream(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Invoke onStreamStart after dispatching agent.stream

onStreamStart is awaited before config.agent.stream(...) is called, so consumers can observe a false "stream started" signal when stream creation fails or stalls. This is especially problematic for telemetry/hooks that treat this callback as the request-dispatch boundary, because it shifts latency measurements earlier and can report an in-flight state even when no request was actually sent.

Useful? React with 👍 / 👎.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/cea/benchmark/test_trajectory.py">

<violation number="1" location="packages/cea/benchmark/test_trajectory.py:96">
P2: Per-step numeric metric validation also accepts booleans as numbers, allowing invalid metric payloads to pass.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread packages/cea/benchmark/test_trajectory.py Outdated
Six issues raised across three AI review bots on PR #117:

BUG (Gemini/Codex/Cubic - Medium): turn-start and onStreamStart fired
  before agent.stream() was invoked, breaking the documented contract
  'fires immediately after agent.stream is invoked'. A stream-creation
  failure could still emit a false 'stream started' lifecycle event
  and skew telemetry for consumers that treat turn-start as the
  network dispatch boundary. Moved the emission/hook in both the
  headless runner and the TUI's runSingleStreamTurn to run after
  agent.stream() successfully returns.

BUG (Gemini/Codex - High): Background startup usage probe raced with
  the first turn's per-turn probe; both paths wrote to
  messageHistory.updateActualUsage with no revision guard. A stale
  startup probe (based on an empty message list) could clobber newer
  usage data and underestimate context pressure, mis-sizing
  maxOutputTokens and compaction decisions. Added a generation token
  on measureUsageIfAvailable: probes whose generation is no longer
  current silently drop their result.

BUG (Codex - Medium): onBlockingChange captures prior spinner text
  to restore after blocking compaction, but the orchestrator can emit
  multiple blocking:true stages for a single compaction
  (starting -> pruning -> compacting). The stash was being overwritten
  every stage, so by unblock time the stashed value was 'Compacting'
  itself. Only stash when foregroundStatusBeforeBlocking is null.

BUG (Cubic - Medium): When the first stream part arrived during a
  blocking compaction, clearStatus() cleared foregroundStatusMessage,
  but the unblock path still called showLoader on the stashed value,
  resurrecting a stale 'Processing...' spinner on what should be a
  clean UI. Guard the restoration with a foregroundStatusMessage null
  check so a cleared loader stays cleared.

BUG (Codex - Low): updateHeader() used to run right after onSetup to
  reflect any async header/footer state the consumer initialised there.
  When the startup probe became non-blocking, the updateHeader call
  was accidentally dropped. Restored it so header metadata renders
  before the idle session begins.

BUG (Cubic - Low): Python validator accepted booleans as numbers for
  per-step and final metric values because isinstance(True, int) is
  True in Python. Added _is_real_number and _is_real_int helpers that
  explicitly exclude bool, matching ATIF-v1.4 expectations.

New tests:
- 'does not emit turn-start when agent.stream() rejects before dispatch'
- 'emits turn-start after agent.stream() succeeds (before first chunk)'

Verified: typecheck, ultracite check, 574 tests (+5 headless), build
all pass. Every review-flagged issue has been fixed or explicitly
addressed in the commit above or earlier commits on this branch.
@minpeter

Copy link
Copy Markdown
Owner Author

Addressed all automated reviewer feedback (commit 245214f)

Thanks to @gemini-code-assist, @chatgpt-codex-connector, and @cubic-dev-ai for catching these. Every flagged issue has been fixed.

Resolved

Issue Severity Reviewers Fix
turn-start / onStreamStart fires before agent.stream() dispatch Medium Gemini, Codex, Cubic Moved emission and hook invocation to strictly after agent.stream() resolves, in both runner.ts and runSingleStreamTurn. Stream-creation failures no longer produce a false lifecycle signal.
Background startup probe races with per-turn probe High Gemini, Codex Added a generation token on measureUsageIfAvailable; a stale probe drops its result when its generation is no longer current.
Stale "Processing…" spinner resurrecting after blocking compaction Medium Cubic Guard the unblock-path showLoader with a foregroundStatusMessage !== null check so a cleared loader stays cleared.
Multi-stage blocking compaction overwrites the stashed label Medium Codex Only stash the original label on first blocking: true entry (foregroundStatusBeforeBlocking === null), never on subsequent stages.
updateHeader() dropped after onSetup when probe became non-blocking Low Codex Restored the call right after await config.onSetup?.().
Python validator accepted bool as number Low Cubic Added _is_real_number / _is_real_int helpers that exclude bool.

Tests added

  • does not emit turn-start when agent.stream() rejects before dispatch
  • emits turn-start after agent.stream() succeeds (before first chunk)

Verification

  • pnpm run typecheck — 6/6 packages pass
  • pnpm run check (ultracite) — clean
  • pnpm run test — 574 tests pass (headless +5 new)
  • pnpm run build — 6/6 packages succeed

Re-request review when ready.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 245214f470

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/tui/src/agent-tui.ts Outdated
Comment on lines 1537 to 1539
showLoader("Processing...");

const preparedTurn = await prepareMessages(phase);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard processing spinner with finally cleanup

showLoader("Processing...") now runs before prepareMessages(phase), but the try/finally that calls clearStatus() starts later. If prepareMessages throws (for example from onBeforeTurn, measureUsage, or compaction checks), the function exits without hitting cleanup, so the foreground spinner remains stuck even though the turn failed and an error message is shown.

Useful? React with 👍 / 👎.

@minpeter

Copy link
Copy Markdown
Owner Author

@cubic review

@cubic-dev-ai

cubic-dev-ai Bot commented Apr 21, 2026

Copy link
Copy Markdown

@cubic review

@minpeter I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 16 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/tui/src/agent-tui.ts">

<violation number="1" location="packages/tui/src/agent-tui.ts:1561">
P1: `onStreamStart` callback failures currently break the turn flow; guard this hook so observer errors don’t abort a valid stream.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread packages/tui/src/agent-tui.ts Outdated
An observer that throws from onStreamStart (or onFirstStreamPart)
used to propagate the error and abort an otherwise-valid stream. This
violates the 'observer-only' intent of the hooks and makes a buggy
telemetry/logging callback strong enough to break a production turn.

- harness loop.ts: introduce invokeObserverHook(hook, hookName, ctx).
  Errors are console.error-logged and swallowed so the stream iteration
  always proceeds. Applied to both onStreamStart and onFirstStreamPart.
- headless runner.ts: wrap config.onStreamStart in try/catch with the
  same logging pattern. The turn-start emission itself stays in place,
  so lifecycle consumers are unaffected.
- tui agent-tui.ts: wrap config.onStreamStart in try/catch; the
  'Working...' loader and the rest of runSingleStreamTurn proceed
  even when the consumer's callback throws.
- LoopHooks docstrings now document the 'observer-only contract':
  errors are swallowed after being logged. This is a public API
  guarantee so consumers can safely rely on it (and don't need to
  defensively wrap their callbacks).
- New headless test: 'continues streaming when onStreamStart throws
  (observer errors are isolated)' confirms the stream completes, the
  turn-start event is still emitted, and the thrown error is logged
  via console.error.

Also repairs a drop of LoopHooks.onToolCall from types.ts that had
slipped through an earlier edit — loop.ts still destructured it, so
typecheck was surfacing the regression now.

Verified: typecheck, ultracite check, 575 tests (+1 headless), build
all pass.
@minpeter

Copy link
Copy Markdown
Owner Author

Addressed Cubic P1: observer-hook error isolation (commit d855e36)

Thanks @cubic-dev-ai — a callback that throws from onStreamStart (or its sibling onFirstStreamPart) was indeed strong enough to abort an otherwise-valid stream. That violates the observer-only intent of these hooks.

Fix

  • packages/harness/src/loop.ts: added an invokeObserverHook(hook, hookName, ctx) helper that try/catch-es the call, logs via console.error with a [harness] prefix, and proceeds. Applied to both onStreamStart and onFirstStreamPart in runAgentLoop.
  • packages/headless/src/runner.ts: wrapped the onStreamStart call in try/catch with the same log-and-continue pattern; the turn-start lifecycle event still fires.
  • packages/tui/src/agent-tui.ts: wrapped the onStreamStart call in runSingleStreamTurn; the Working... loader and the subsequent stream render proceed.
  • LoopHooks docstrings: now explicitly document the "observer-only contract" so consumers can rely on it without defensively wrapping their callbacks.

New test

continues streaming when onStreamStart throws (observer errors are isolated) — confirms stream completion, turn-start emission, and the console.error log path.

Also repaired in the same commit

An earlier edit had accidentally dropped LoopHooks.onToolCall from types.ts while deduplicating an onFirstStreamPart docstring. loop.ts still destructured onToolCall, so tsc correctly flagged the regression after this edit exposed it. Restored the field with its original signature.

Verification

  • pnpm run typecheck — 6/6 packages
  • pnpm run check — clean
  • pnpm run test — 575 tests (headless +1)
  • pnpm run build — 6/6 packages

@cubic-dev-ai

cubic-dev-ai Bot commented Apr 21, 2026

Copy link
Copy Markdown

Addressed Cubic P1: observer-hook error isolation (commit d855e36)

Thanks @cubic-dev-ai — a callback that throws from onStreamStart (or its sibling onFirstStreamPart) was indeed strong enough to abort an otherwise-valid stream. That violates the observer-only intent of these hooks.

Fix

...

@minpeter The implementation in d855e36 correctly isolates observer hooks and resolves the P1 issue. By wrapping these calls in try/catch blocks and logging errors via console.error instead of propagating them, you've ensured that a failure in a telemetry or UI callback won't crash the agent loop or TUI session.

Key highlights from the fix:

  • Harness Abstraction: The invokeObserverHook helper in packages/harness/src/loop.ts provides a clean, reusable pattern for safely invoking onStreamStart and onFirstStreamPart.
  • Contract Documentation: Updating the LoopHooks docstrings to explicitly state the "observer-only contract" is a great touch for future-proofing the API.
  • Robust Testing: The new headless test case successfully verifies that the stream completes and the turn-start event is still emitted even when a hook throws.
  • Regression Fix: Restoring the accidentally dropped onToolCall field in types.ts ensures the public API remains intact.

This approach perfectly aligns with the intent of these hooks as non-blocking observers. For more details on managing these types of learnings or custom rules in the future, you can check the Cubic documentation.

@minpeter

Copy link
Copy Markdown
Owner Author

@cubic review

@cubic-dev-ai

cubic-dev-ai Bot commented Apr 21, 2026

Copy link
Copy Markdown

@cubic review

@minpeter I have started the AI code review. It will take a few minutes to complete.

Future edits to the headless trajectory pipeline must preserve ATIF v1.4
compliance. Until now the contract lived only in the changeset and the
public README — easy to drift on in a hurry. This commit plants the
contract directly on the load-bearing code and tests so a future
maintainer cannot relax it by accident.

Contract points now documented on-source:

- trajectory-collector.ts module header: enumerates the six invariants
  (schema_version literal, step_id sequence, steps[*].source set,
  extra.* persistence rule, final_metrics null-when-absent, SDK-only
  metrics) and bounds the scope (persisted trajectory only — not the
  JSONL stream).

- TrajectoryJson / AtifStep interface JSDocs: explain the spec-version
  bump discipline and that step-level fields require an ATIF v1.4
  definition (or belong under extra).

- runner.ts collectTrajectoryEvent: clarifies that the default-case
  drop is INTENTIONAL (non-ATIF types stay stream-only) and that new
  cases must ship with a matching extra.* path in finalize().

- runner.ts runHeadless: inline comment pinpoints where the two output
  surfaces (JSONL stdout vs ATIF trajectory.json) diverge, so the
  'headless has two outputs' realisation does not need to be
  reconstructed every time.

- types.ts TrajectoryEvent union: rewrites the one-line 'complete
  union' docstring into a contract that forces every new event type
  to pick 'extra.* persistence' or 'drop' at design time, and forbids
  promotion to a top-level ATIF field or steps[*].source value.

- packages/headless/AGENTS.md: new 'ATIF v1.4 COMPLIANCE (persisted
  trajectory.json)' section listing the same invariants as the
  authoritative engineering contract.

- atif-events.test.ts module header: declares the suite as the
  executable compliance contract with a pointer to the Python
  validator that must be updated in lock-step.

No runtime behaviour changes. Verified: typecheck, ultracite check,
575 tests, build all pass.
@minpeter

Copy link
Copy Markdown
Owner Author

@cubic review

@cubic-dev-ai

cubic-dev-ai Bot commented Apr 21, 2026

Copy link
Copy Markdown

@cubic review

@minpeter I have started the AI code review. It will take a few minutes to complete.

Second pass through the changeset against the full commit list surfaced
three small holes and one missing package header:

- 'plugsuits' (packages/cea) was missing from the changeset header. The
  PR actually touches its user-facing CLI help text ('--atif' now
  mentions ATIF-v1.4) and the benchmark validator pipeline, so cea
  needs a patch bump in lock-step with the other packages.

- Added a dedicated CEA bullet describing the --atif wording change
  and the Python validator pipeline improvements (Harbor validator
  auto-call + stricter shape check + bool rejection).

- Credited the Cubic-reported bool-in-number bug in the 'PR review
  feedback' section. The fix was already described under the ATIF v1.4
  validator improvements bullet, but the review-attribution list had
  dropped the sixth item.

- Noted that the fifth-commit fix quietly restored a dropped
  LoopHooks.onToolCall declaration. Consumers that rely on this hook
  had been running against a destructure with no matching type; this
  is now type-safe again and the changeset mentions it so anyone
  diffing public types sees why that line moved.

No code changes and no behaviour changes — the changeset content now
matches the actual commit range on this branch.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 16 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/harness/src/types.ts">

<violation number="1" location="packages/harness/src/types.ts:173">
P2: Expose the current stream part in `onFirstStreamPart`, or remove the advice about filtering by part type. The callback only receives `LoopContinueContext`, so consumers cannot distinguish framing chunks from visible output as the docs promise.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread packages/harness/src/types.ts Outdated
The docstring for LoopHooks.onFirstStreamPart told consumers to filter
on part type (to distinguish framing chunks like 'start' and
'text-start' from visible output), but the callback only received
LoopContinueContext — no part to filter on. The contract was broken at
the type level: the advice was physically impossible to follow.

This commit fixes the hook before adoption. Since onFirstStreamPart
was added earlier in the same PR and has zero consumers anywhere in
the monorepo (verified by exhaustive grep across packages/harness,
packages/tui, packages/headless, packages/cea, packages/minimal-agent,
packages/tgbot, scripts/, and the runtime/session adapter layer), the
signature change is type-only and requires no migration.

Changes:
- types.ts: onFirstStreamPart signature is now
    (part: TextStreamPart<ToolSet>, context: LoopContinueContext)
    => void | Promise<void>
  Docstring rewritten to describe what the hook actually does: fires
  once per iteration on the very first part regardless of visibility;
  the consumer decides what to do based on part.type but the hook does
  not re-fire on a later visible part.
- types.ts: imports TextStreamPart from 'ai' (alongside the existing
  ToolSet / ToolCallPart imports).
- index.ts: re-exports TextStreamPart from '@ai-sdk-tool/harness' so
  consumers don't have to import it from 'ai' separately. This matches
  how the TUI package already uses the exact type
  (TextStreamPart<ToolSet>) for its isVisibleStreamPart filter.
- loop.ts: generalises invokeObserverHook to accept variadic args
  (<Args extends readonly unknown[]>), so both (context)-only hooks
  (onStreamStart) and (part, context) hooks (onFirstStreamPart) share
  the same error-isolation helper. Call site at the for-await entry
  now passes the current 'part' alongside the context.
- loop.test.ts: four new regression tests covering:
    * first-part single-fire semantics (tool-call in iteration 0)
    * per-iteration firing (two iterations, fires twice with correct
      iteration numbers)
    * empty-stream skip (no parts → hook never fires)
    * observer-error isolation (thrown error is logged and swallowed,
      stream iteration continues normally)

Oracle consulted before changing the public signature: recommended
Option A ('pre-adoption correction') over Option B (relax docstring)
given the zero-consumer inventory. The justification is that the hook
is one commit old and has not been released, so fixing it now is
materially different from breaking an established API in a patch
release.

Verified: typecheck (6/6), ultracite check (277 files clean), full
test suite (1235 tests, harness +4 new), build (6/6).
@minpeter

Copy link
Copy Markdown
Owner Author

Addressed Cubic P2: onFirstStreamPart signature corrected (commit 91008d0)

Thanks @cubic-dev-ai — the docstring on LoopHooks.onFirstStreamPart told consumers to filter on part type to distinguish framing chunks from visible output, but the callback never received the part. The contract was broken at the type level.

Decision

Consulted Oracle on whether to (A) fix the signature or (B) relax the docstring. Oracle recommended Option A: pre-adoption correction based on:

  • The hook was introduced earlier in this same PR — one commit old, not yet released.
  • Exhaustive consumer inventory found zero callers across packages/harness, packages/tui, packages/headless, packages/cea, packages/minimal-agent, packages/tgbot, scripts/, and all runtime/session adapter layers. The hook type-only existed in its declaration and one internal call site.
  • Option B would leave the hook's primary purpose (clearing a prompt-processing indicator when real output starts) measurably weaker.

Fix

New signature:

onFirstStreamPart?: (
  part: TextStreamPart<ToolSet>,
  context: LoopContinueContext
) => void | Promise<void>;
  • Uses the exact type that flows through stream.fullStreamTextStreamPart<ToolSet> from the Vercel AI SDK. The TUI package already uses the same type for isVisibleStreamPart.
  • TextStreamPart is now re-exported from @ai-sdk-tool/harness root so consumers don't need to pull it from ai directly.
  • invokeObserverHook generalised to <Args extends readonly unknown[]> so both (context) and (part, context) hooks share the same error-isolation helper.
  • Docstring rewritten: clarifies that the hook fires on the very first part regardless of visibility; filtering inside the callback lets consumers decide what to do, it does NOT cause the hook to re-fire on a later visible part. This was an explicit Oracle recommendation to avoid a subtle semantic trap.

Tests added (4 new)

In loop.test.ts:

  • invokes onFirstStreamPart exactly once with the first emitted part
  • invokes onFirstStreamPart on each iteration when the stream has content
  • skips onFirstStreamPart when the stream yields no parts
  • isolates onFirstStreamPart observer errors from the stream flow

Verification

  • pnpm run typecheck — 6/6 packages (confirms TextStreamPart<ToolSet> propagates correctly)
  • pnpm run check (ultracite) — 277 files clean
  • pnpm run test1235 tests pass (harness +4 new for a total of 641)
  • pnpm run build — 6/6 packages

@cubic-dev-ai

cubic-dev-ai Bot commented Apr 21, 2026

Copy link
Copy Markdown

Addressed Cubic P2: onFirstStreamPart signature corrected (commit 91008d0)

Thanks @cubic-dev-ai — the docstring on LoopHooks.onFirstStreamPart told consumers to filter on part type to distinguish framing chunks from visible output, but the callback never received the part. The contract was broken at the type level.

Decision

...

@minpeter The update in 91008d0 effectively resolves the P2 issue by aligning the onFirstStreamPart signature with its documented purpose. Fixing this now as a "pre-adoption correction" was the right call—providing the part directly is essential for consumers to distinguish between framing events (like start) and the actual content that should clear a "Processing..." indicator.

Key improvements in this fix:

  • Type Safety: Using TextStreamPart<ToolSet> and re-exporting it from @ai-sdk-tool/harness makes the hook easy to use without requiring consumers to manage extra dependencies on ai.
  • Variadic Helper: Generalizing invokeObserverHook in loop.ts to support variadic arguments is a clean way to maintain error isolation across different hook signatures (onStreamStart vs onFirstStreamPart).
  • Semantic Clarity: The updated docstring correctly warns about the "first part" behavior, ensuring developers understand it won't re-fire on the first visible part if a framing part arrived first.
  • Test Coverage: The new cases in loop.test.ts (especially for empty streams and per-iteration firing) provide solid verification for the loop logic.

This change completes the "prompt processing" visibility feature by making the underlying hooks actually usable for their intended purpose. For more on Cubic's review standards, you can refer to the Cubic documentation.

@minpeter

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@minpeter

Copy link
Copy Markdown
Owner Author

@codex review

@minpeter

Copy link
Copy Markdown
Owner Author

@cubic-dev-ai review

@coderabbitai

coderabbitai Bot commented Apr 21, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cubic-dev-ai

cubic-dev-ai Bot commented Apr 21, 2026

Copy link
Copy Markdown

@cubic-dev-ai review

@minpeter I have started the AI code review. It will take a few minutes to complete.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91008d0191

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/tui/src/agent-tui.ts Outdated
Comment on lines 1537 to 1539
showLoader("Processing...");

const preparedTurn = await prepareMessages(phase);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Move processing spinner init inside cleanup scope

showLoader("Processing...") is invoked before prepareMessages, but clearStatus() only runs in the later try/finally. If prepareMessages throws (for example from onBeforeTurn, usage measurement, or compaction checks), execution exits before the finally, leaving the spinner stuck on screen even though an error is reported in processInput. This creates a persistent false "in progress" UI state after turn-prep failures.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
packages/headless/src/__tests__/atif-events.test.ts (1)

586-687: v1.4 신규 compliance 테스트 적절.

  • total_cost_usd 합산 테스트에서 toBeCloseTo(0.2, 10)로 부동소수점 오차 가드한 점 양호.
  • total_cost_usd null 케이스로 "비용 미보고 ≠ 0" 의미론을 명시화.
  • logprobs/prompt_token_ids/completion_token_ids 보존은 toAtifStep{ ...rest, metrics } 스프레드 동작(trajectory-collector.ts:187-198)을 정확히 검증합니다.

향후 metrics에 새 옵셔널 필드가 추가되면 ATIF_STEP_ALLOWED_FIELDS 화이트리스트 테스트는 metrics 내부 키까지는 검사하지 않으므로, v1.4 확장 필드의 whitelist 레벨 검증이 필요하면 별도 Set을 추가해 볼 여지가 있습니다(선택적).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/headless/src/__tests__/atif-events.test.ts` around lines 586 - 687,
The tests for TrajectoryCollector (finalize()) correctly validate total_cost_usd
aggregation and null behavior, and preservation of optional ATIF-v1.4 fields; no
code change required—leave TrajectoryCollector.finalize, toAtifStep, and the
ATIF_STEP_ALLOWED_FIELDS logic in trajectory-collector.ts as-is, but if you
later need stricter whitelist checks for nested metrics keys add a separate Set
for allowed metric keys and update toAtifStep to validate metrics against that
Set.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/cea/benchmark/test_trajectory.py`:
- Around line 13-18: The helper _is_real_number currently only excludes bools so
float('nan') and float('inf') pass; update _is_real_number to also reject
non-finite floats by using math.isfinite (i.e., return isinstance(value, (int,
float)) and not isinstance(value, bool) and (math.isfinite(value) if
isinstance(value, float) else True)) and add the required import for math; leave
_is_real_int unchanged.

In `@packages/headless/AGENTS.md`:
- Around line 16-23: Update the documentation in AGENTS.md to correct the
lifecycle-annotation persistence behavior: change the paragraph so it states
that only `turn-start` and `error` are JSONL-only and dropped from the persisted
trajectory, while `approval`, `compaction`, and `interrupt` annotations are
preserved by the runner and recorded under `extra.approval_events`,
`extra.compaction_events`, and `extra.interrupt_events` (as implemented by
`collectTrajectoryEvent`); remove the blanket claim that those three annotation
types are dropped and add brief note pointing readers to
`collectTrajectoryEvent` for exact field names.

In `@packages/headless/src/trajectory-collector.ts`:
- Around line 75-79: The TrajectoryJson.extra type is too permissive because the
Record<string, unknown> lets arbitrary lifecycle buckets bypass the ATIF
persistence contract; remove the open record and restrict extra to only the
known persisted buckets (approval_events?: ApprovalEvent[]; compaction_events?:
CompactionEvent[]; interrupt_events?: InterruptEvent[]), then update/verify
TrajectoryCollector.finalize() and the TrajectoryCollector implementation to
only write those known buckets and require adding a new collector method
whenever a new lifecycle bucket is introduced.
- Around line 16-20: The doc comment is inconsistent about which lifecycle
events are persisted; update the wording in trajectory-collector.ts to clearly
state that approval/compaction/interrupt lifecycle annotations are persisted
under extra.approval_events, extra.compaction_events and extra.interrupt_events
on the trajectory, while events like turn-start and error are not persisted as
step sources and live only in the JSONL stream; also add the guideline note that
adding a new persisted lifecycle bucket requires implementing a corresponding
collector method.

In `@packages/headless/src/types.ts`:
- Around line 4-13: The comment says the docblock incorrectly states that
lifecycle annotations (`approval`, `compaction`, `interrupt`, `turn-start`) are
JSONL-only; in reality `approval`, `compaction`, and `interrupt` are persisted
to ATIF via TrajectoryCollector as extra.approval_events,
extra.compaction_events, and extra.interrupt_events. Update the comment in
packages/headless/src/types.ts to restrict "JSONL-only" to only transient
signals (e.g., `turn-start`) and explicitly note that `approval`, `compaction`,
and `interrupt` are forwarded by TrajectoryCollector into ATIF `extra.*` fields
while `turn-start` remains transient.

In `@packages/tui/src/agent-tui.ts`:
- Around line 1041-1064: Capture the current history revision at the start of
measureUsageIfAvailable (e.g., const startRev =
config.messageHistory.getRevision()) and after awaiting
config.measureUsage(messages) verify the revision is unchanged before calling
config.messageHistory.updateActualUsage; if startRev !==
config.messageHistory.getRevision() treat the probe as stale and return false.
Do this in the measureUsageIfAvailable path that updates usage (the block using
usageProbeGeneration and calling config.messageHistory.updateActualUsage) and
apply the same revision-capture-and-check to the other similar spot (the second
occurrence around where updateActualUsage is called) so background probes cannot
overwrite newer history.

---

Nitpick comments:
In `@packages/headless/src/__tests__/atif-events.test.ts`:
- Around line 586-687: The tests for TrajectoryCollector (finalize()) correctly
validate total_cost_usd aggregation and null behavior, and preservation of
optional ATIF-v1.4 fields; no code change required—leave
TrajectoryCollector.finalize, toAtifStep, and the ATIF_STEP_ALLOWED_FIELDS logic
in trajectory-collector.ts as-is, but if you later need stricter whitelist
checks for nested metrics keys add a separate Set for allowed metric keys and
update toAtifStep to validate metrics against that Set.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 983e34d5-0965-43c7-81a5-3cfa57925765

📥 Commits

Reviewing files that changed from the base of the PR and between 88b7197 and 91008d0.

📒 Files selected for processing (18)
  • .changeset/prompt-processing-indicators.md
  • packages/cea/benchmark/AGENTS.md
  • packages/cea/benchmark/scorer.py
  • packages/cea/benchmark/test_trajectory.py
  • packages/cea/src/entrypoints/main.ts
  • packages/harness/src/index.ts
  • packages/harness/src/loop.test.ts
  • packages/harness/src/loop.ts
  • packages/harness/src/types.ts
  • packages/headless/AGENTS.md
  • packages/headless/README.md
  • packages/headless/src/__tests__/atif-events.test.ts
  • packages/headless/src/runner.test.ts
  • packages/headless/src/runner.ts
  • packages/headless/src/trajectory-collector.ts
  • packages/headless/src/types.ts
  • packages/tui/src/agent-tui.ts
  • packages/tui/src/stream-handlers.ts

Comment thread packages/cea/benchmark/test_trajectory.py
Comment thread packages/headless/AGENTS.md Outdated
Comment thread packages/headless/src/trajectory-collector.ts Outdated
Comment thread packages/headless/src/trajectory-collector.ts Outdated
Comment thread packages/headless/src/types.ts Outdated
Comment on lines +1041 to 1064
let usageProbeGeneration = 0;

const measureUsageIfAvailable = async (
messages: ModelMessage[]
): Promise<boolean> => {
if (!config.measureUsage) {
return false;
}

usageProbeGeneration += 1;
const thisGeneration = usageProbeGeneration;

const measured = normalizeUsageMeasurement(
await config.measureUsage(messages)
);
if (!measured) {
return false;
}

if (thisGeneration !== usageProbeGeneration) {
return false;
}

config.messageHistory.updateActualUsage({

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

startup probe 결과를 history revision으로도 무효화해 주세요.

usageProbeGeneration은 “더 새 probe가 시작된 경우”만 막습니다. 사용자가 입력해 messageHistory가 바뀐 직후, 다음 measureUsageIfAvailable(messagesForLLM)가 시작되기 전에 background measureUsageIfAvailable([])가 resolve되면 빈 메시지 기준 usage가 현재 history에 적용될 수 있습니다. getRevision()을 캡처해 probe 시작/완료 사이 history 변경도 stale 처리해 주세요.

🛡️ 제안 수정
     usageProbeGeneration += 1;
     const thisGeneration = usageProbeGeneration;
+    const revisionAtStart = config.messageHistory.getRevision?.();
 
     const measured = normalizeUsageMeasurement(
       await config.measureUsage(messages)
     );
@@
-    if (thisGeneration !== usageProbeGeneration) {
+    if (
+      thisGeneration !== usageProbeGeneration ||
+      (revisionAtStart !== undefined &&
+        config.messageHistory.getRevision?.() !== revisionAtStart)
+    ) {
       return false;
     }

Also applies to: 1081-1083

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/tui/src/agent-tui.ts` around lines 1041 - 1064, Capture the current
history revision at the start of measureUsageIfAvailable (e.g., const startRev =
config.messageHistory.getRevision()) and after awaiting
config.measureUsage(messages) verify the revision is unchanged before calling
config.messageHistory.updateActualUsage; if startRev !==
config.messageHistory.getRevision() treat the probe as stale and return false.
Do this in the measureUsageIfAvailable path that updates usage (the block using
usageProbeGeneration and calling config.messageHistory.updateActualUsage) and
apply the same revision-capture-and-check to the other similar spot (the second
occurrence around where updateActualUsage is called) so background probes cannot
overwrite newer history.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 18 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/tui/src/agent-tui.ts">

<violation number="1" location="packages/tui/src/agent-tui.ts:1537">
P2: `Processing...` loader is shown before the guarded `try/finally`, so setup errors can leave a stuck spinner.</violation>
</file>

<file name="packages/headless/README.md">

<violation number="1" location="packages/headless/README.md:172">
P3: Clarify that only transient annotations like `turn-start` are dropped; approval/compaction/interrupt are still persisted under `extra.*` in `trajectory.json`.</violation>
</file>

<file name="packages/headless/AGENTS.md">

<violation number="1" location="packages/headless/AGENTS.md:21">
P3: Clarify that only `turn-start` is dropped; `approval`, `compaction`, and `interrupt` are persisted under `trajectory.extra.*`.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread packages/tui/src/agent-tui.ts Outdated
Comment thread packages/headless/README.md Outdated
Comment thread packages/headless/AGENTS.md Outdated
Cycle-1 triggered a fresh round of AI reviews. Consolidating every
actionable finding into a single commit:

P1 (Oracle): runHeadless could persist an invalid zero-step ATIF
  trajectory when the stream failed before any step was emitted. Harbor's
  own validator rejects steps: []. TrajectoryCollector.writeTo now returns
  boolean and skips disk writes for zero-step runs instead of producing
  an invalid file.

P2 (Codex + Cubic): showLoader('Processing...') lived outside the
  runSingleStreamTurn try/finally, so a thrown prepareMessages /
  onBeforeTurn / usage probe / compaction check would leave the spinner
  stuck on screen. Moved the call into the try block so clearStatus() in
  finally cleans it up on every exit path.

P2 (CodeRabbit): the startup usage-probe guard only rejected results from
  stale generations, not results computed against a different history.
  Added a messageHistory.getRevision() capture-and-compare so a
  background probe resolving after user input no longer overwrites the
  new turn's real usage with empty-message baseline.

P2 (CodeRabbit): TrajectoryJson.extra was typed as
  '{approval_events?, compaction_events?, interrupt_events?} & Record<string, unknown>',
  letting new lifecycle buckets type-check without a matching
  TrajectoryCollector method. Dropped the open-record intersection so the
  ATIF persistence contract is enforced structurally.

P2 (CodeRabbit): test_trajectory.py's _is_real_number accepted NaN,
  Infinity, and -Infinity (all producible via json.loads on non-strict
  input). Added an explicit math.isfinite() check.

P3 (CodeRabbit + Cubic): documentation drift — several docstrings and
  markdown files claimed 'approval/compaction/interrupt are JSONL-only
  and dropped from trajectory.json'. That is wrong: they are persisted
  under trajectory.extra.* by the collector. Only turn-start and error
  are truly transient. Corrected in packages/headless/AGENTS.md,
  packages/headless/README.md, packages/headless/src/types.ts,
  packages/headless/src/trajectory-collector.ts module JSDoc, and the
  root AGENTS.md.

Regression test added: runner.test.ts now asserts that zero-step runs
with atifOutputPath set do NOT produce a file (does not write an invalid
zero-step trajectory when the stream fails before any step).

Verified: typecheck (6/6), ultracite check (277 files clean),
tests (1236 pass; headless +1 new = 65), build (6/6).
@minpeter

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@minpeter

Copy link
Copy Markdown
Owner Author

@codex review

@minpeter

Copy link
Copy Markdown
Owner Author

@cubic-dev-ai review

@coderabbitai

coderabbitai Bot commented Apr 21, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cubic-dev-ai

cubic-dev-ai Bot commented Apr 21, 2026

Copy link
Copy Markdown

@cubic-dev-ai review

@minpeter I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 19 files

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 45a810f4dd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +245 to +246
if (this.steps.length === 0) {
return false;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear stale trajectory output on zero-step runs

When writeTo returns early on this.steps.length === 0, it leaves any existing outputPath file untouched; with a reused path (e.g., CEA defaults to trajectory.json), a zero-step run (such as an early abort/timeout before any step is emitted) can silently preserve and expose the previous run's trajectory as if it were current output. This is a data-integrity regression introduced by the new skip-write behavior; the zero-step path should delete/truncate the target file or force callers to handle cleanup when false is returned.

Useful? React with 👍 / 👎.

Cycle 2: Cubic reported 'No issues found across 19 files'. Oracle's
second audit surfaced three genuine items the bots missed:

1. Headless probe race (Oracle P2): measureUsageIfAvailable in the
   headless runner lacked the generation + revision guard that the TUI
   already had. A slow background probe could overwrite fresher
   post-compaction usage. Mirrored the TUI pattern using
   messageHistory.getRevision?.() with optional chaining (the headless
   message history interface declares getRevision as optional).

2. ATIF step source contract drift (Oracle P2): trajectory-collector.ts
   permits steps[*].source = 'user' | 'agent' | 'system' (Harbor ATIF
   v1.4 allows all three; system steps support observations since v1.2),
   but the bundled Python validator rejected 'system' and the benchmark
   docs documented only 'user | agent'. Aligned all three surfaces.

3. Root README.md drift (Oracle P3): headless event list omitted
   turn-start and did not point at Harbor's ATIF-v1.4 schema. Updated
   to match the current package docs.

Verified: typecheck (6/6), ultracite check (277 files clean), 1236
tests pass, build (6/6). No new tests required — the headless probe
guard is covered by the same invariants the TUI guard already tests.
@minpeter

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@minpeter

Copy link
Copy Markdown
Owner Author

@codex review

@minpeter

Copy link
Copy Markdown
Owner Author

@cubic-dev-ai review

@coderabbitai

coderabbitai Bot commented Apr 21, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cubic-dev-ai

cubic-dev-ai Bot commented Apr 21, 2026

Copy link
Copy Markdown

@cubic-dev-ai review

@minpeter I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 20 files

…ype (cycle 3)

Cycle 3 Oracle audit flagged exactly one remaining issue: the changeset
for PR #117 said in its 'features' bullet (line 14) that
TrajectoryJson.extra was typed as an open record, then said in its
'cycle 1 follow-ups' bullet (line 30) that it was later narrowed back
to the three canonical lifecycle buckets. The shipped code matches the
NARROWED version, so the earlier bullet was stale.

Fixed by rewriting line 14 to reflect the final, shipped state: extra
is a closed record of exactly approval_events / compaction_events /
interrupt_events, and new lifecycle types must extend the interface
explicitly.

No runtime changes. Verified: ultracite check clean, 1236 tests pass,
build (6/6 full turbo).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8160c46acb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +245 to +246
if (this.steps.length === 0) {
return false;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Clear stale ATIF output when skipping zero-step writes

Returning early when this.steps.length === 0 leaves any pre-existing file at outputPath untouched. If a run fails before emitting a step (for example, agent.stream() rejects before any user/agent step is recorded) and reuses the same atifOutputPath, downstream tooling will read stale trajectory data from an older run and silently report incorrect metrics. Ensure the path is removed or overwritten when zero-step persistence is skipped.

Useful? React with 👍 / 👎.

Comment on lines +1088 to +1089
const runBackgroundStartupProbe = (): void => {
measureUsageIfAvailable([]).then(ignore, ignore);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Cancel background startup probe on TUI teardown

The startup usage probe is launched fire-and-forget and never tracked or canceled. In sessions where measureUsage performs network I/O, exiting the TUI before that probe resolves can keep the Node process alive until the probe completes, and the late result can still mutate history/header state after shutdown. Tie this probe to teardown (or an abort signal) so early exits are deterministic.

Useful? React with 👍 / 👎.

@cubic-dev-ai

cubic-dev-ai Bot commented Apr 21, 2026

Copy link
Copy Markdown

You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment @cubic-dev-ai review.

@minpeter minpeter merged commit 496ffdb into main Apr 21, 2026
6 checks passed
@minpeter minpeter deleted the feat/prompt-processing-indicators branch April 22, 2026 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant