Skip to content

[FEAT]: In-Browser Voice Recorder with Live Whisper TranscriptionΒ #232

@krrishrastogi05

Description

@krrishrastogi05

πŸ“ Description

FireForm supports voice memos as input but currently requires the user to record audio externally and upload a file. For field use β€” a firefighter finishing a shift in a truck β€” this friction is unacceptable. We should capture audio directly in the browser with live transcription displayed as they speak.

πŸ’‘ Rationale

Eliminating the "record externally β†’ find the file β†’ upload" flow is the single biggest UX improvement for field responders. The MediaRecorder API is available in all modern browsers and requires no native app. Combined with a local Whisper endpoint (already in the FireForm stack via Ollama), this can be done entirely on-device with no cloud dependency.

πŸ› οΈ Proposed Solution

Frontend:

  • Use the browser MediaRecorder API to capture microphone audio in-browser
  • Stream audio chunks to the backend via WebSocket or chunked POST
  • Display live transcription text as Whisper processes each chunk

Backend:

  • Add a /api/v1/transcribe/stream WebSocket endpoint
  • Pipe incoming audio chunks to the local Whisper model
  • Return partial transcription tokens in real time

βœ… Acceptance Criteria

  • User can record audio directly in the browser without any file upload step
  • Live transcription text appears on screen while recording
  • Final transcription is passed into the existing LLM extraction pipeline seamlessly
  • Works in Chrome and Firefox
  • Graceful fallback to file upload if MediaRecorder is unavailable
  • Works within the existing Docker container setup

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions