Skip to content

Partial transcription, in-place history rewrite, live UI polish#14

Merged
etiennechabert merged 2 commits into
mainfrom
claude/streaming-partial-transcription
Apr 21, 2026
Merged

Partial transcription, in-place history rewrite, live UI polish#14
etiennechabert merged 2 commits into
mainfrom
claude/streaming-partial-transcription

Conversation

@etiennechabert

Copy link
Copy Markdown
Owner

Summary

Streaming-style transcription: while a speaker keeps talking, Polyglot re-runs Whisper + translation on the growing buffer every 10 s and emits a payload with the same utterance_id. The frontend upserts by utterance_id, so the same row updates in place; when the final batch flushes (speaker switch or 60 s cap) it replaces the partial with the polished version.

What ships

Backend

  • partial_transcribe_and_emit() runs Whisper + translations on a buffer snapshot, uses _current_bot_speaker as the speaker label, skips diarization / transcript-file write / summarization accumulation.
  • Shares transcription_lock with the final so partial + final Whisper calls serialise on the GPU.
  • Mints a fresh utterance_id on the first chunk of every batch (in both the is_processing=True and =False branches of process_audio) and resets it when the final flushes, so every partial + final for one turn shares the same key.
  • Final WS payloads now carry utterance_id + is_partial:false; partials carry utterance_id + is_partial:true.
  • Admin auth replays the current bot_status (including the meeting URL) + meet_roster so a late-arriving admin tab reflects the active bot.
  • Cache-Control: no-store headers on the viewer + admin routes so browsers always fetch fresh templates.

Bot

  • Emits a bot_info event on connect with the meeting URL so the admin panel shows which call the bot is attached to, even when the bot was started from the CLI.
  • nameFromTile() now strips the "'s Presentation" suffix so a screen-sharing participant's tile doesn't label them "X's Presentation".

Frontend (admin + viewer)

  • Transcript is keyed by utterance_id. Partial arrivals upsert the matching row; finals replace it. Timestamps are preserved across the swap so rows don't reshuffle.
  • Partials get a dashed left border + "● live" tag + 0.75 opacity; finals get a solid border + full opacity.
  • Viewer row segments render in a vertical body column instead of the flex-row that was making multi-segment utterances render horizontally.
  • Viewer keeps the ?p=<password> query param across refreshes so the user doesn't have to re-enter the passphrase on every reload.

Test plan

  • Validated live against a real Meet call: partials fire every 10 s, carry the same utterance_id, final at 60 s cap flips the bubble from dashed to solid and mints a new id for the next turn.
  • [BOT] Resolved SPEAKER_XX → Etienne Chabert's Presentation proves Phase 5 still works end-to-end on finals.
  • Diagnostic socket.io client confirms partials reach admin + lang_en rooms when viewers are joined.
  • Admin bot-status + URL replay on auth verified with a manual refresh after bot was already running.

🤖 Generated with Claude Code

etiennechabert and others added 2 commits April 21, 2026 14:10
Streaming-style transcription: while a speaker keeps talking, Polyglot
re-runs Whisper + translation on the growing buffer every 10 seconds
and emits a payload with the same utterance_id. The frontend upserts by
utterance_id so the same row updates in place; when the final batch
flushes (speaker switch or 60 s cap) it replaces the partial with the
polished version.

Backend
- New partial_transcribe_and_emit() runs Whisper + translations on a
  buffer snapshot, uses _current_bot_speaker as the speaker label, skips
  diarization, transcript-file write, and summarization accumulation.
- Shares the existing transcription_lock so partial + final Whisper
  calls serialise on the GPU.
- Mints a fresh utterance_id on the first chunk of every batch (both
  the is_processing=True and =False branches of process_audio) and
  resets it when the final flushes, so every partial + final for one
  turn shares the same key.
- Final WS payloads now carry utterance_id + is_partial=false; partials
  carry utterance_id + is_partial=true. Admin auth replays the current
  bot_status (including the meeting URL) + meet_roster so a late-
  arriving admin tab reflects the active bot.
- Cache-Control: no-store headers on the viewer + admin routes so
  browsers always fetch fresh templates.

Bot
- Emits a `bot_info` event on connect with the meeting URL so the admin
  panel shows which call the bot is attached to, even when the bot was
  started from the CLI.
- nameFromTile() now strips the "'s Presentation" suffix so a
  screen-sharing participant's tile doesn't label them
  "X's Presentation".

Frontend (admin + viewer)
- Transcript is now keyed by utterance_id. Partial arrivals upsert the
  matching row; finals replace it. Timestamps are preserved across the
  swap so rows don't reshuffle.
- In-place styling distinguishes the two states: partials get a dashed
  left border + "● live" tag + 0.75 opacity, finals get a solid border
  + full opacity.
- Viewer row segments render in a vertical body column instead of the
  flex-row that was making multi-segment utterances render horizontally.
- Viewer keeps the ?p=<password> query param across refreshes so the
  user doesn't have to re-enter the passphrase on every reload.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Partial transcription interval is now a runtime-mutable threshold
  (audio_thresholds["partial_interval_sec"]), default 5 s (was 10 s).
  Settable via /api/thresholds, overridable via PARTIAL_INTERVAL_SEC
  env var. New admin Settings slider: "Partial Transcription Interval
  (seconds)" — 2..30 s, step 1.
- Admin auth replays the current bot_status (with meeting URL) and
  meet_roster so a late-joining admin tab immediately reflects the
  active bot instead of showing Start-bot while a bot is already live.
- Admin + viewer transcript renderers now group consecutive same-
  speaker segments: speaker name shown once at the top, each sentence
  rendered as a bulleted line with its own :SS timestamp (the second
  of the minute when the sentence started). Cuts the repetition when
  one speaker says several sentences in one turn.
- Viewer keeps the ?p=<password> query param across refreshes so the
  passphrase survives reloads.
- Remove the "Speaking: X" banner that toggled on/off every ~1.5 s as
  captions paused. Partial bubbles already carry the speaker name
  inline, so the banner was redundant and visually noisy.
- Cache-Control: no-store on / /viewer /admin so browsers always pull
  fresh templates (no more hard-refresh needed during development).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@etiennechabert etiennechabert merged commit 2e54eb6 into main Apr 21, 2026
2 of 6 checks passed
@etiennechabert etiennechabert deleted the claude/streaming-partial-transcription branch April 21, 2026 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant