Partial transcription, in-place history rewrite, live UI polish#14
Merged
Conversation
Streaming-style transcription: while a speaker keeps talking, Polyglot re-runs Whisper + translation on the growing buffer every 10 seconds and emits a payload with the same utterance_id. The frontend upserts by utterance_id so the same row updates in place; when the final batch flushes (speaker switch or 60 s cap) it replaces the partial with the polished version. Backend - New partial_transcribe_and_emit() runs Whisper + translations on a buffer snapshot, uses _current_bot_speaker as the speaker label, skips diarization, transcript-file write, and summarization accumulation. - Shares the existing transcription_lock so partial + final Whisper calls serialise on the GPU. - Mints a fresh utterance_id on the first chunk of every batch (both the is_processing=True and =False branches of process_audio) and resets it when the final flushes, so every partial + final for one turn shares the same key. - Final WS payloads now carry utterance_id + is_partial=false; partials carry utterance_id + is_partial=true. Admin auth replays the current bot_status (including the meeting URL) + meet_roster so a late- arriving admin tab reflects the active bot. - Cache-Control: no-store headers on the viewer + admin routes so browsers always fetch fresh templates. Bot - Emits a `bot_info` event on connect with the meeting URL so the admin panel shows which call the bot is attached to, even when the bot was started from the CLI. - nameFromTile() now strips the "'s Presentation" suffix so a screen-sharing participant's tile doesn't label them "X's Presentation". Frontend (admin + viewer) - Transcript is now keyed by utterance_id. Partial arrivals upsert the matching row; finals replace it. Timestamps are preserved across the swap so rows don't reshuffle. - In-place styling distinguishes the two states: partials get a dashed left border + "● live" tag + 0.75 opacity, finals get a solid border + full opacity. - Viewer row segments render in a vertical body column instead of the flex-row that was making multi-segment utterances render horizontally. - Viewer keeps the ?p=<password> query param across refreshes so the user doesn't have to re-enter the passphrase on every reload. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Partial transcription interval is now a runtime-mutable threshold (audio_thresholds["partial_interval_sec"]), default 5 s (was 10 s). Settable via /api/thresholds, overridable via PARTIAL_INTERVAL_SEC env var. New admin Settings slider: "Partial Transcription Interval (seconds)" — 2..30 s, step 1. - Admin auth replays the current bot_status (with meeting URL) and meet_roster so a late-joining admin tab immediately reflects the active bot instead of showing Start-bot while a bot is already live. - Admin + viewer transcript renderers now group consecutive same- speaker segments: speaker name shown once at the top, each sentence rendered as a bulleted line with its own :SS timestamp (the second of the minute when the sentence started). Cuts the repetition when one speaker says several sentences in one turn. - Viewer keeps the ?p=<password> query param across refreshes so the passphrase survives reloads. - Remove the "Speaking: X" banner that toggled on/off every ~1.5 s as captions paused. Partial bubbles already carry the speaker name inline, so the banner was redundant and visually noisy. - Cache-Control: no-store on / /viewer /admin so browsers always pull fresh templates (no more hard-refresh needed during development). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Streaming-style transcription: while a speaker keeps talking, Polyglot re-runs Whisper + translation on the growing buffer every 10 s and emits a payload with the same
utterance_id. The frontend upserts byutterance_id, so the same row updates in place; when the final batch flushes (speaker switch or 60 s cap) it replaces the partial with the polished version.What ships
Backend
partial_transcribe_and_emit()runs Whisper + translations on a buffer snapshot, uses_current_bot_speakeras the speaker label, skips diarization / transcript-file write / summarization accumulation.transcription_lockwith the final so partial + final Whisper calls serialise on the GPU.utterance_idon the first chunk of every batch (in both theis_processing=Trueand=Falsebranches ofprocess_audio) and resets it when the final flushes, so every partial + final for one turn shares the same key.utterance_id+is_partial:false; partials carryutterance_id+is_partial:true.bot_status(including the meeting URL) +meet_rosterso a late-arriving admin tab reflects the active bot.Cache-Control: no-storeheaders on the viewer + admin routes so browsers always fetch fresh templates.Bot
bot_infoevent on connect with the meeting URL so the admin panel shows which call the bot is attached to, even when the bot was started from the CLI.nameFromTile()now strips the "'s Presentation" suffix so a screen-sharing participant's tile doesn't label them "X's Presentation".Frontend (admin + viewer)
utterance_id. Partial arrivals upsert the matching row; finals replace it. Timestamps are preserved across the swap so rows don't reshuffle.?p=<password>query param across refreshes so the user doesn't have to re-enter the passphrase on every reload.Test plan
utterance_id, final at 60 s cap flips the bubble from dashed to solid and mints a new id for the next turn.[BOT] Resolved SPEAKER_XX → Etienne Chabert's Presentationproves Phase 5 still works end-to-end on finals.admin+lang_enrooms when viewers are joined.🤖 Generated with Claude Code