-
Notifications
You must be signed in to change notification settings - Fork 113
Open
Description
π Description
FireForm supports voice memos as input but currently requires the user to record audio externally and upload a file. For field use β a firefighter finishing a shift in a truck β this friction is unacceptable. We should capture audio directly in the browser with live transcription displayed as they speak.
π‘ Rationale
Eliminating the "record externally β find the file β upload" flow is the single biggest UX improvement for field responders. The MediaRecorder API is available in all modern browsers and requires no native app. Combined with a local Whisper endpoint (already in the FireForm stack via Ollama), this can be done entirely on-device with no cloud dependency.
π οΈ Proposed Solution
Frontend:
- Use the browser
MediaRecorderAPI to capture microphone audio in-browser - Stream audio chunks to the backend via WebSocket or chunked POST
- Display live transcription text as Whisper processes each chunk
Backend:
- Add a
/api/v1/transcribe/streamWebSocket endpoint - Pipe incoming audio chunks to the local Whisper model
- Return partial transcription tokens in real time
β Acceptance Criteria
- User can record audio directly in the browser without any file upload step
- Live transcription text appears on screen while recording
- Final transcription is passed into the existing LLM extraction pipeline seamlessly
- Works in Chrome and Firefox
- Graceful fallback to file upload if
MediaRecorderis unavailable - Works within the existing Docker container setup
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels