Skip to content

fix(gemini): resolve response truncation for file upload queries#79

Open
dsebban wants to merge 49 commits intonicobailon:mainfrom
dsebban:main
Open

fix(gemini): resolve response truncation for file upload queries#79
dsebban wants to merge 49 commits intonicobailon:mainfrom
dsebban:main

Conversation

@dsebban
Copy link
Copy Markdown

@dsebban dsebban commented Mar 29, 2026

Summary

Fixes response truncation when uploading files via surf gemini --file <path>.

Problem

File uploads with Gemini emit multiple progressive candidate snapshots in the StreamGenerate response. The previous parser stopped at the first snapshot, which often contained only a partial prefix like "This image is a" instead of the full response.

Solution

Modified parseGeminiStreamGenerateResponse() to:

  • Collect all candidate snapshots from the response stream
  • Select the last non-empty snapshot (handles progressive updates)
  • Falls back to last snapshot if all texts are empty
  • Maintains backward compatibility with text-only queries

Testing

Tested with:

  • gemini-2.5-flash
  • gemini-3.1-flash-lite-preview
  • gemini-3.1-pro-preview

All models now return full responses for image uploads instead of truncated prefixes.

Example

Before fix:

$ surf gemini \"describe this\" --file image.jpg
This image is a

After fix:

$ surf gemini \"describe this\" --file image.jpg
This image is a technical analysis stock chart for Aflac Inc (AFL) showing...
[full detailed description]

Adds an opt-in headless execution path for `surf gemini` via Bun WebView
(SURF_USE_BUN_GEMINI=1), replacing the Chrome extension for Gemini queries.
Chrome profile cookie injection handles auth (macOS, --profile <email>).
Legacy extension path is preserved as fallback.

New files:
- native/gemini-bun-worker.ts      — headless Chrome driver (Bun.WebView)
- native/gemini-bun-bridge.cjs     — Node↔Bun IPC spawn/protocol
- native/gemini-bun-profile-auth.ts — CDP cookie injection from Chrome profile
- native/chrome-profile-utils.cjs  — macOS Chrome profile + AES cookie decrypt
- native/gemini-common.cjs         — shared Gemini helpers extracted from client
- native/bun-webview.d.ts          — TypeScript declarations for Bun.WebView

Changes to existing files:
- native/cli.cjs          — SURF_USE_BUN_GEMINI gate, --profile flag
- native/gemini-client.cjs — imports shared helpers from gemini-common.cjs

Model selection:
- resolveGeminiModelForUI() passes unknown model IDs through as-is
- trySelectModel() maps IDs → UI mode keywords (Pro/Fast/Thinking)
  e.g. gemini-3.1-pro-preview → Pro, gemini-3.1-flash-lite-preview → Fast

Hardening (gemini-client.cjs legacy path):
- Direct HTTP first with AT cache, model-header fallback chain
- Browser UI rescue path for failures
- ClipboardEvent paste for Angular/Quill state updates
- Response truncation fix for file upload queries

Tests (293→294):
- test/unit/gemini-common.test.ts
- test/unit/gemini-bun-bridge.test.ts
- test/unit/chrome-profile-utils.test.ts
- test/unit/gemini-client.test.ts
dsebban added 28 commits March 30, 2026 13:06
Lint (biome):
- Update biome schema to 2.4.9 (was 2.3.11, CI uses ^2.4.4)
- Remove noUnusedExpressions nursery rule (not in biome 2.4.x)
- Fix test files: useNodejsImportProtocol, useBlockStatements,
  noUnusedImports, noUnusedFunctionParameters, import sort

Typecheck:
- Add @types/node dev dep; add "node" to tsconfig types array
  (fixes require/Buffer/process errors in our new test files)
- src/cdp/controller.ts: object→Record<string,unknown> for sendCommand
- src/service-worker/index.ts:
  - tab?.url/title optional chaining (tab may be undefined)
  - chrome.windows.create result renamed to avoid shadowing window
  - chrome.tabGroups.ColorEnum→Color, windowStateEnum→WindowState
  - TabChangeInfo→OnUpdatedInfo
  - tabs.group/ungroup: cast any[] to [number,...number[]]
- test/mocks/chrome.ts: add frozen:false, restore selected:true (required)

294 tests, 0 lint errors, 0 typecheck errors.
…ling)

- activateCreateImageTool: fix already-active detection (was matching
  static suggestion buttons, causing false early return); now checks
  for active dismiss-chip with close button only
- Tools button: aria-label='Tools' found via direct selector (was falling
  through to scoped scan due to false-positive already-active)
- Menu item: match 'images'/'image' first-word (Gemini uses 'Images New'
  not 'Create image'); exclude video/music/canvas explicitly
- pollResponseState: replace img[src*=gg-dl] with structured ImageCandidate
  collection — handles blob:, picture srcset, <a> download links, any img
  with naturalWidth>=256 from latest model-response subtree
- waitForResponse: new signature (baseline: PollState, expectsImage bool);
  image mode waits for stable new display candidates, not text stability
- saveGeneratedImage: accepts ImageCandidate[]; for blob: URLs draws img
  to canvas.toDataURL() to capture before revocation; HTTP URLs fetched
  with credentials; tries candidates in priority order
- E2E verified: cat image generated and saved to /tmp/test-cat.png (31.2s)
- pollResponseState: scope loading detection to last model-response
  subtree only; unrelated page spinners no longer block completion
- activateCreateImageTool: unified isImageItem() matcher across
  alreadyActive check, menu click, and verification — was checking
  'create image' (never matched) vs 'images new' (Gemini label)
- activateCreateImageTool: detect aria-checked=true before clicking
  to avoid toggling tool OFF when already selected
- activateCreateImageTool: poll up to 4s for Tools button before
  attempting to click (toolbar renders async, caused intermittent miss)
- saveGeneratedImage(blob): canvas failure now falls back to fetch
  before rejecting candidate

E2E verified: 'glowing red sphere' → sphere.png saved in 81.6s
New files:
- native/chatgpt-bun-bridge.cjs — env flag, eligibility, worker spawn
- native/chatgpt-bun-worker.ts — headless worker with stealth patches
- native/chatgpt-bun-profile-auth.ts — Chrome profile cookie injection
- native/chatgpt-bun-worker-logic.ts — model mapping + text stability
- native/cdp-stealth.cjs — shared CDP anti-detection patches
- test/unit/chatgpt-bun-bridge.test.ts — bridge tests (17 tests)
- test/unit/chatgpt-bun-worker-logic.test.ts — model/stability tests (16)
- test/unit/cdp-stealth.test.ts — stealth helper tests (18 tests)

Modified:
- native/cli.cjs — SURF_USE_BUN_CHATGPT routing gate
- native/gemini-bun-worker.ts — stealth patches applied

Working: auth, stealth (bypasses Cloudflare), model selection
  (Instant/Thinking/Pro picker via Radix dispatch), turn detection
  (.sr-only labels), stream capture (fetch hook pre-navigation)

WIP: response text extraction — ChatGPT DOM uses empty p[data-start]
  placeholders, actual text not in DOM. SSE fetch returns 403;
  conversation likely streams via WebSocket (wss://ws.chatgpt.com).
  Need WebSocket message interception for text extraction.

345 tests passing (16 files)
- Switch stream capture from fetch hook to WebSocket interception
  (ChatGPT uses wss://ws.chatgpt.com for conversation streaming)
- Fix login timing: increase post-navigation delay to 3s
- Move WebSocket hook to Page.addScriptToEvaluateOnNewDocument
- Add investigation docs: docs/chatgpt-headless-investigation.md

Status: auth ✅, stealth ✅, model selection ✅, turn detection ✅
Text extraction: DOM fallback works but includes UI noise.
WebSocket capture registered but may miss frames (WS created
before hook in SPA navigation). Need to either:
1. Clean DOM text by stripping known UI patterns
2. Fix WebSocket interception timing
3. Use CDP Network.webSocketFrameReceived events directly

345 tests passing
Root cause: pre-navigation fetch hook caused 403 (auth tokens not ready).
Thinking/Pro models use stream_handoff → conduit transport (not direct SSE).

Changes:
- Replace WebSocket monkey-patch with post-load fetch hook (pageEval)
- Add delta_encoding v1 parser (single op, batch ops, legacy message)
- Add DOM text sanitizer (strips 18 exact UI-noise line patterns)
- Add chooseBestText() arbitration (stream primary, DOM fallback)
- Handle stream_handoff: empty stream → DOM fallback after render

New exports in chatgpt-bun-worker-logic.ts:
- createEmptyChatGptStreamState / applyChatGptFramePayload
- sanitizeChatGptAssistantText / chooseBestText

E2E validated:
- Instant: 'what is 2+2?' → '4' (11.7s)
- Default: 'capital of France?' → 'Paris' (10.1s)
- Thinking: 'what is 7*8?' → '56' (19.6s, stream_handoff + DOM)

377 tests passing (16 files)
…nsion

- Add ⏳ progress feedback during thinking/responding phases
  (detects 'Thinking', 'Analyzing image', 'Searching' labels from DOM)
- Fix stream_handoff: only trust stream.done when stream.text is
  non-empty (Thinking/Pro models send [DONE] with empty text)
- Add more UI noise patterns: 'analyzing image', 'searching the web',
  'generating image', etc.
- Reduce poll interval from 600ms to 400ms for better phase detection

E2E validated:
- Instant: '2+2' → '4' ✅
- Thinking: 'quantum entanglement' → full explanation ✅ (21.9s)
- File: sphere.png → correct description ✅ (36.4s)
- Progress: '⏳ Thinking' and '⏳ Responding' shown during wait

377 tests passing
…, model selection

- chatgpt-cloak-worker.mjs: ESM worker using CloakBrowser (33 C++ stealth patches)
- chatgpt-cloak-profile-auth.mjs: Chrome cookie extraction via node:sqlite
- chatgpt-cloak-bridge.cjs: process bridge with JSON-lines protocol
- cli.cjs: route to CloakBrowser when SURF_USE_CLOAK_CHATGPT=1

Features: cookie auth, isolated/persistent profiles, model selection,
file upload, humanized input, DOM stability polling, signal cleanup.

E2E: instant/thinking/file-upload all validated. 377 tests passing.
- --help now documents SURF_USE_CLOAK_CHATGPT=1 as default ChatGPT headless
  backend with Bun fallback, model aliases, and Gemini headless examples
- add 'surf skills' command: prints SKILL.md from package or ~/.pi/~/.agents
- sync skills/surf/SKILL.md with updated agent skill (CLOAK as default,
  model table, backend comparison table, constraints)
- native/session-store.cjs: slug-based session IDs, ~/.surf/sessions/
  <tool>-<prompt-slug>_<YYYYMMDD-HHmmss>/ with meta.json + output.log
- surf session: list/view/clear commands with status table
- Wire sessions into all 3 headless AI paths (gemini-bun, chatgpt-bun,
  chatgpt-cloak) via stderr intercept — captures every progress step
- surf --help: surf session entry in More Help section
- docs/investigation-gemini-file-upload.md: root cause analysis for
  --file attachment silent failure (wrong input targeted, no chooser intercept)
Worker (chatgpt-cloak-worker.mjs):
- DETECT_PHASE_JS now returns {phase, isThinking} instead of raw string
- Extracts actual ChatGPT thinking label (e.g. 'Thinking for 15 seconds')
  by cloning last assistant turn + stripping .sr-only/.markdown/buttons
- Emits {type:'trace', phase, isThinking} on every phase change
- Stability check: requiredStableCycles=2 + minStableMs=1200 (matches bun)

Bridge (chatgpt-cloak-bridge.cjs):
- New case 'trace': forwards to onProgress callback

CLI (cli.cjs):
- progress.type==='trace' → writes '[cloak-chatgpt] ⏳ <phase>' to stderr
  identical to bun worker's '[bun-chatgpt] ⏳ Thinking' behavior

Result: thinking/pro models show live phase labels during reasoning,
matching the bun headless path UX exactly.
cli.cjs:
- P1-1: Cloak path - move sess/_origWrite outside try so catch+finally
  can always call sess.fail() and restore process.stderr.write
- P1-2: Bun chatgpt/gemini fallback-to-legacy now calls sess.fail() with
  code='fallback' before startLegacySocketPath() (no more stuck 'running')
- P1-4: surf session --clear --hours validates input; rejects NaN/0/negative
  with a clear error instead of silently deleting everything
- P1-4: surf session list --hours also sanitizes to positive finite number

session-store.cjs:
- P1-3: makeSessionId uses ms precision + pid suffix (YYYYMMDD_HHmmss.mmm_PPPP)
  prevents same-second collision from concurrent runs
- P1-4: deleteSessions throws on invalid hours (NaN/0/negative) instead of
  silently treating it as delete-all
- P1-5: loadSession prefix search sorts candidates by meta.json createdAt
  (not by dir name string) so newest match is always returned
Root cause: uploadFileViaCDP used DOM.querySelectorAll('input[type=file]')
which found a pre-existing drag-and-drop input, not the one activated by
the upload menu. Setting files on the wrong input showed a UI chip but
never uploaded bytes to Google servers.

Fix: Use Page.setInterceptFileChooserDialog + Page.fileChooserOpened CDP
event to intercept the OS file chooser, then DOM.setFileInputFiles with
the backendNodeId from the event (exact input the app activated).

Changes to gemini-bun-worker.ts uploadFileViaCDP():
- Enable Page.setInterceptFileChooserDialog before clicking
- Listen for Page.fileChooserOpened via Bun.WebView addEventListener
- Extract backendNodeId from event.data for precise input targeting
- 3-attempt retry with escalating timeouts (10s/15s/20s)
- Menu item click result now logged + validated (throws on no-item)
- Always disable interception in finally block
- Extracted waitForUploadChip() helper for chip readiness polling

Modeled after the working extension implementation (UPLOAD_FILE_TO_TAB
handler in service-worker/index.ts) adapted for Bun.WebView CDP API.
gemini-bun-worker.ts:
- P1-1: createFileChooserWaiter returns {promise, cancel} so stale
  listeners are always cleaned up in the catch block — no leaked
  handlers from failed clickUploadSequence() calls on retry
- P1-2: waitForUploadChip moved outside retry loop — only the
  chooser open + file set phase retries, not chip detection
- P2: backendNodeId extraction hardened with typeof === 'number'
  check instead of truthiness (catches 0 edge case and missing field)

bun-webview.d.ts:
- Document CDP event dispatch contract (MessageEvent with event.data
  containing parsed CDP params) so worker assumptions are explicit
- Type addEventListener handler param as { data?: any } instead of
  ...args: unknown[]
- Add EXTRACT_THINKING_TRACE_JS: walks React fiber tree from 'Thought for'
  button to find allMessages prop with content_type=thoughts
- Extract thoughts array (summary + content per step), duration, recap text
- Pass thinkingTrace through bridge mapSuccess to CLI
- CLI outputs thinkingTrace in JSON, prints summary to stderr
- Works fully headless (no flyout click needed)
- Validated: 3s thinking captures duration/recap; 12s captures full trace

Architecture findings documented:
- ChatGPT now uses WebSocket conduit for streaming (not HTTP SSE)
- Thinking trace button is disabled=true in headless (flyout never renders)
- React fiber state persists thoughts after response completes
Review fixes:
- P1: Scope button search to current assistant turn (targetTurn.querySelectorAll)
  instead of global document scan. Prevents returning stale trace on chatgpt.reply.
- P1: Cap payload: max 100 thoughts, 2000 chars/content, 200 chars/summary.
  Prevents large JSONL payloads on long Pro sessions.
- P1: Sanitize turnId before injection into JS template (alphanumeric/hyphen only).
- P2: Add bridge tests for thinkingTrace pass-through and omission.
- P2: Use conditional spread in mapSuccess to avoid undefined key in result.
During Pro model thinking, the DOM shows thinking-summary text in a
.markdown div. The stability check would fire on this stable text and
capture it as the response, breaking out before the actual answer.

Fix: while isThinkingPhase (stop button visible + phase.isThinking),
override isStreaming=true and finished=false in advanceTextStability
to prevent shouldComplete from triggering.

Validated: Pro model 'What is 2+2?' → 'Four' captured correctly after
9s thinking + 384s total. Previously would have captured thinking-phase
text and exited with partial=true.
… pi TUI

- /surf-chats command + Ctrl+Shift+G shortcut
- Two-pane overlay: conversation list ↔ detail viewer
- Search conversations via / key
- Load detail with Enter, inject into pi context on second Enter
- Export to markdown with e key
- Reuses native chatgpt-chats-formatter.cjs for normalization/markdown
- Shells out to surf chatgpt.chats via pi.exec()
- Handles CloakBrowser latency with loading states
- Stale request protection via monotonic requestId
- Add explicit 'implements Component, Focusable' (matches nicobailon refs)
- Import Component + Focusable types from pi-tui
- Pass done() directly to overlay constructor (reference pattern)
- Close on Escape calls done() directly, not via action callback
- Remove 'close' OverlayAction variant (dead code)
- Proper dispose() clears focused flag
- invalidate() has doc comment
- Pagination: j/↓ at last item triggers load_more (+30 items); list
  auto-scrolls to keep selection visible
- Arrow keys (↑↓): scroll detail pane when conversation is loaded;
  fall back to list navigation when detail is not cached
- j/k: always navigate list (clear role separation)
- J/K / PgUp/PgDn: unchanged, larger jumps (10 lines)
- Persistent cache: module-level detailCache + listCache survive
  overlay close/reopen within the same pi session; list shows
  instantly from cache, background-refreshes silently (<5 min TTL)
- Footer hint updated: 'j/k list • ↑↓ scroll detail • ...'
- Load-more hint shown at bottom of list when hasMore=true
P1-1: cli.cjs — route export message to stderr when --json is set,
      keeping stdout as valid JSON
P1-2: chatgpt-chats-formatter.cjs — walkConversationMessages now
      follows active path (root → current_node) by default, excluding
      abandoned/regenerated branches; 'full' mode opt-in for DFS
P1-3: chatgpt-cloak-chats-worker.mjs — searchConversations catches
      backend search failures and falls back to local paginated scan
      with backendSearchFailed + partial flags
P1-4: chatgpt-cloak-chats-worker.mjs — file download uses
      page.request.fetch() to stream directly to disk (Node-side),
      removing 10MB base64 cap and stdout shuttling
P2-5: chatgpt-chats-cache.cjs — atomic writes via temp file + rename,
      preventing torn cache files from concurrent processes
P2-6: tests — active-path vs full-tree walk, fallback on missing
      current_node, markdown excludes abandoned branches, cache
      atomicity (no .tmp residue), predicate invalidation
…ation

- Press d or Delete key on selected conversation → confirmation banner
- Banner shows: ⚠ Delete "<title>"? y/n
- y confirms → calls surf chatgpt.chats <id> --delete --json
- Any other key cancels → back to idle
- On success: removes from list, evicts from detail + list caches,
  clamps selection, shows notification
- Footer hints updated: 'd del' shown in normal mode,
  'y confirm delete • any other key cancel' during confirmation
- surf-client: added deleteConversation() method
Worker refactor (chatgpt-cloak-chats-worker.mjs):
- Replace page.evaluate(fetch()) with context.request API (Playwright
  HTTP client) — no page navigation, no waitForReady, no DOM polling
- Eliminates ~10-30s of goto + Cloudflare + readiness check per call
- All CRUD ops (list/search/get/delete/rename/download) use direct
  HTTP via context.request with the context's cookie jar
- Add bulk_delete action: uses /backend-api/conversations/delete for
  multi-ID, falls back to bounded-parallel (4) individual deletes
- fetchAccessToken() gets session token via context.request.get()

CLI (cli.cjs):
- Add --delete-ids flag for comma-separated bulk delete
- Route bulk_delete through bridge → worker with conversationIds array
- Print summary: 'Deleted N conversations (M failed)'
- Cache invalidation covers bulk_delete

Pi extension (pi-surf-chats):
- Multi-select with Space key (● mark indicator, auto-advance)
- d/Delete on marked items → single confirmation → one CLI call
- Uses bulkDeleteConversations() → surf --delete-ids id1,id2,...
- One CloakBrowser launch for all deletes instead of N launches
- Footer: 'Space mark' added to hints
…ther

Root cause: handleAction() called activeAbort?.abort() on every action,
killing in-flight deletes when a second delete (or any action) fired.

Fix: separate delete processing into independent queue that runs
alongside load/search/export without mutual cancellation.

- Delete queue: Array<{ids, titles}> processed sequentially
- processDeleteQueue() drains queue one batch at a time
- Non-exclusive actions (toggle_mark, delete, confirm/cancel) skip
  the activeAbort?.abort() path
- Exclusive actions (load_list, search, export, load_detail) still
  cancel each other as before
- Overlay close aborts both activeAbort and deleteAbort, clears queue
- Errors show notification instead of blocking the UI
Replace broken single-abort + requestId pattern with typed operation
lanes that handle concurrency correctly:

LIST lane (mutually exclusive, last wins):
  - load_list, search, load_more
  - Own AbortController + generation token
  - Only lane that replaces state.items
  - New search cancels old search, not detail/export/delete

BACKGROUND lanes (independent FIFO queues):
  - detail runner: per-conversation load with dedupe
  - export runner: file export with dedupe
  - delete runner: bulk delete with queue draining
  - Each has own abort, queue, and state
  - Never cancel each other or LIST lane
  - Continue draining after per-job failure

SYNC lane (immediate, no async):
  - toggle_mark, inject, delete prompt, cancel_delete

State model rewrite (types.ts):
  - Remove global phase/statusMessage/lastError/pendingDeleteId
  - Add per-lane state: ListLaneState, DetailLaneState,
    ExportLaneState, DeleteLaneState
  - Add typed DeletePromptState (no more comma-joined string hacks)
  - Add derived StatusBarState (recomputed after every lane mutation)
  - Status bar priority: active work > errors > info notices

Overlay updates (overlay.ts):
  - Confirmation reads deletePrompt directly (typed arrays)
  - Status bar renders from statusBar.level/message
  - Detail pane uses per-conversation error/loading from detailLane
  - List footer uses listLane.isRunning (not global phase)
  - load_more hint restricted to recent mode only

All 401 unit + 42 CLI tests pass.
Detail, export, and delete runners were adding items to tracking
arrays (queuedConversationIds, queuedRequests) on enqueue but never
removing them when the runner started processing. This caused:

1. UI stuck showing 'Queued…' permanently (status bar derived from
   non-empty queuedConversationIds)
2. Retry blocked by dedupe check (includes() always true)

Fix: each runner's process() now splices the item from the tracking
array before processing begins.
Three-tier Escape: search edit → detail view → close overlay.

When viewing a conversation detail, Esc clears loadedConversationId
and returns to list browsing mode. The cached detail persists in
memory — pressing Enter on the same item instantly re-shows it
(second Enter injects). Arrow keys revert to list navigation when
detail is dismissed.
dsebban added 20 commits April 4, 2026 23:26
Three improvements for conversation detail reading:

Disk cache (~/.surf/cache/chatgpt-md/):
  - getConversation() checks {id}.md + {id}.meta.json before network
  - First load: ~3s (network). Subsequent loads: instant (disk read)
  - Atomic writes (tmp + rename). Evicted on delete.
  - Survives pi restart (unlike in-memory persistentDetailCache)

Message jumping (n/p keys):
  - n = jump to next message, p = jump to previous message
  - renderDetail() records line offsets per message boundary
  - Scroll snaps to message start line
  - Way faster than arrow-scrolling through long conversations

Scroll position indicator:
  - Title bar shows 'msg 3/15 · 42%' when viewing detail
  - Computed from detailScrollOffset vs messageLineOffsets
  - Updates on every scroll/render

Footer hints update:
  - Detail mode: '↑↓ scroll • n/p next/prev msg • J/K page • Enter inject • Esc back'
  - List mode: unchanged
P1-1: Disk cache now validates update_time — if the list item's
update_time is newer than cached, cache is bypassed and re-fetched.
Stores updateTime in meta.json and DetailRecord. Prevents stale
conversation content after continuing a chat elsewhere.

P1-2: messageLineOffsets and messageCount are now cleared at the
top of renderDetail(), not just inside the cached-detail branch.
Prevents stale scroll indicator / n/p offsets when navigating away
from a loaded conversation without entering another detail view.
Root cause: CloakBrowser crash leaves SingletonLock symlink in
~/.surf/cloak-profile/, blocking all subsequent launches with
'Failed to create ProcessSingleton'. Worker exits 0, bridge sees
no result → unhelpful 'worker exited without result' error.

Fix: sharedProfileDir() now checks if SingletonLock's target PID
is still alive. If dead → removes stale lock automatically.
Prevents the most common CloakBrowser launch failure.

Also: improved error classification in surf-client.ts for
profile lock and worker crash scenarios.
Previously, CLI only showed brief phase labels during thinking
("⏳ Thinking", first 80 chars of label). The full thinking
content was only extracted post-completion from React fiber.

Now: polls React fiber state every 500ms during thinking phase,
reads thought objects including partial chunks[] array, computes
deltas, and emits rich trace events with full thought text.

Worker (chatgpt-cloak-worker.mjs):
  - extractThinkingTrace() now reads t.chunks[] fallback when
    t.content is empty (streaming partial thoughts)
  - New helpers: formatThinkingTraceText(), buildThinkingProgressPayload()
  - Response loop polls fiber during isThinkingPhase, emits
    {type:'trace', traceType:'thinking_text', thoughtDelta, ...}
  - Final extraction falls back to live-captured trace

Bridge (chatgpt-cloak-bridge.cjs):
  - Forwards traceType, thoughtText, thoughtDelta, thoughtCount,
    durationSec, recapText through both stdout handlers

CLI (cli.cjs):
  - traceType=thinking_text → prints 🧠 prefix with delta lines
  - Regular trace → prints ⏳ prefix (unchanged)

Test:
  - New bridge test: verifies rich trace payload passthrough
  - 402 unit + 42 CLI tests pass
The React fiber (extractThinkingTrace) doesn't populate thoughts
during streaming — only after completion. But the DOM already
contains the live thinking text in the assistant turn's textContent.

Changes:
- DETECT_PHASE_JS now returns thinkingText (raw.slice(0, 8000))
  alongside the 80-char phase label
- Response loop emits traceType:'thinking_text' from DOM text
  with delta computation (avoids re-sending unchanged text)
- Fiber extraction runs as secondary enrichment (may populate
  structured thoughts later in thinking phase)
- Final trace falls back to live-captured fiber data

Verified live: 🧠 lines appear during Pro model thinking with
the actual thought content, not just 'Thinking' label.

402 unit + 42 CLI tests pass.
P1-1: Fiber extraction no longer gated by DOM thinkingText.
  extractThinkingTrace() runs under `if (isThinkingPhase)`
  regardless of DOM content, ensuring liveThinkingTrace is
  populated even when DOM is temporarily empty.

P1-2: Timer line stripped before delta computation.
  DETECT_PHASE_JS regex strips 'Thought for Ns' / 'Thinking for Ns'
  prefix, preventing every timer tick from invalidating the
  startsWith() check and re-emitting the entire snapshot.

P1-3: Removed 8k source cap; cap emitted payloads instead.
  thinkingText returned uncapped from DOM. thoughtText capped
  at 8k and thoughtDelta at 4k on emit, so new content remains
  streamable past 8k while avoiding flooding the CLI.

P2-4: Removed dead helpers formatThinkingTraceText() and
  buildThinkingProgressPayload() — no longer used in live path.

Verified live: 8 clean delta lines during Pro model thinking,
no timer prefix contamination, no duplicate re-emission.
402 unit + 42 CLI tests pass.
- biome format fixes across test files
- Added native/chatgpt-chats-search.d.mts for type declarations
- Fixed implicit 'any' params in search test with explicit types
- All gates pass: lint, typecheck, 402 unit, 42 CLI
The test hardcoded 'MacIntel' but CI runs on Linux. Now derives
expected platform from process.platform matching the module logic.
…k poll, live conversationId

- session-store.cjs: store pid, conversationId, baselineAssistantMessageId, reconcile in meta.json
  on createSession(); lazy SESSIONS_DIR via getSessionsDir() to support env overrides in tests;
  add Session.update(patch) for mid-run meta updates; add updateSession(id, patch)
- session-reconciler.cjs (new): defaultPidIsAlive(), isChatGptCloakSession(),
  resolveConversationId(), inspectConversation(), reconcileSessions() with local PID
  check + optional network poll via GET /backend-api/conversation/{id}
- chatgpt-cloak-bridge.cjs: forward meta_update progress events from worker
- chatgpt-cloak-worker.mjs: emit meta_update with conversationId when URL changes to /c/{id}
  (new conversations); emit conversationId+baselineAssistantMessageId after send for continuations
- cli.cjs: handle meta_update in runChatGptCloakQueryDirect (stores conversationId mid-run);
  surf session --reconcile [--network] command; auto-reconcile (PID-only) on surf session list;
  ✗ orphaned / ! stale / ? running status labels in list view
- test/unit/session-reconciler.test.ts: 22 tests covering all reconcile paths
- cli-tests.sh: 3 new session reconcile tests
P1-1: baselineAssistantMessageId now uses DOM data-message-id (not data-testid)
  - chatgpt-cloak-worker.mjs: extract baseline.messageId from EXTRACT_TEXT_JS
  - matches backend current_node for accurate reconcile comparison

P1-2: never orphan sessions with alive PID, even if old
  - session-reconciler.cjs: if pidAlive, only annotate as 'stale' (not 'error')
  - sessions with alive PID but past MAX_RUNNING_AGE stay 'running' with reconcile.state='stale'
  - prevents race where list auto-reconcile terminates legitimate long jobs

P1-3: --network gate now uses isCloakBrowserAvailable()
  - cli.cjs: check isCloakBrowserAvailable() before enabling pollNetwork
  - skip network poll gracefully instead of failing

P1-4: PID reuse defense simplified (alive check is primary defense)
  - keeping sessions alive when PID alive prevents false stale-orphan

P2-5: session directory/file permissions hardened
  - session-store.cjs: create dirs with 0700, files with 0600
  - protects persisted conversation IDs, profile args, reconcile state

Tests:
  - session-reconciler.test.ts: updated 'too old' test to expect 'stale' not 'orphaned'
  - session-reconciler.test.ts: added realistic baseline mismatch regression test
  - cli-tests.sh: added 'stale not orphaned' test for alive PID + old session
- Add 'reason' field to all reconcileSessions result entries
- Fix --hours N value being parsed as session ID (skip flag values)
- Add CLI test for --hours arg handling
- Clean up test sessions
…kill

- --prompt-file reads file content as prompt text (for large exported contexts)
- Unlike --file which uploads as attachment, --prompt-file uses content as query
- Works with chatgpt and chatgpt.reply commands
- Added CLI tests for missing/empty prompt file
- Updated SKILL.md with --prompt-file docs
- New rp-surf-oracle skill: rp-cli export → surf chatgpt pipeline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant