Secure full-stack speech-to-text app built with Next.js, React, TypeScript, and server-side OpenAI API calls.
- Live Mic mode creates a server-side OpenAI Realtime client secret, connects the browser to the Realtime API with WebRTC, and shows partial/final transcript updates.
- File Upload mode validates common audio/video files and transcribes them through a server-only API route.
- Transcript workspace supports editing, copy, clear, TXT/JSON export, and SRT/VTT export when timestamp segments exist.
- No persistent storage is used by default. Uploads are handled in memory and are not written to disk.
app/api/transcribe/route.ts: multipart upload API, size checks, rate limiting, server-side transcription.app/api/realtime/session/route.ts: creates short-lived realtime client secrets; the raw OpenAI API key never reaches the browser.src/lib/constants.ts: centralized model names, MIME types, extensions, and limits.src/lib/config.ts: server-only environment loading and startup/runtime validation helpers.src/server/fileTranscription.ts: OpenAI Audio Transcriptions API wrapper.src/server/realtime.ts: Realtime transcription session-secret wrapper.src/lib/transcriptExport.ts: TXT, JSON, SRT, and VTT export helpers.components/stt-app.tsx: Live Mic, File Upload, and transcript workspace UI.
The Codex OpenAI Platform API-key setup connector is installed, but it required reauthentication during this setup session before it could create a key. Create a project-scoped OpenAI Platform key named production-stt-app manually or through your approved organization secret workflow, then install it as a server-only secret:
- Create the key in the OpenAI Platform API key settings for the intended project.
- Store it as
OPENAI_API_KEYin your deployment secret store. - For local development only, place it in
.env.local. - Do not commit
.env.local; it is ignored by.gitignore. - Reauthenticate the OpenAI Platform connector in Codex if you want Codex to create the key through the approved encrypted setup flow later.
The app never prints, logs, or exposes the raw key. npm run validate:env checks only that OPENAI_API_KEY is present and key-shaped. npm run verify:openai checks whether the configured project can see the configured STT models.
npm install
cp .env.example .env.local
# edit .env.local and set OPENAI_API_KEY through a secure local editor
npm run devOpen http://localhost:3000.
npm run dev
npm run build
npm start
npm run validate:env
npm run verify:openai
npm run smoke:upload
npm run lint
npm run format
npm run typecheck
npm run test
npm run test:e2e
npm run qualityDefaults are centralized in src/lib/constants.ts and can be overridden through environment variables:
- Live STT:
gpt-realtime-whisper - File STT default:
gpt-4o-mini-transcribe - File STT higher accuracy:
gpt-4o-transcribe - Upload limit: 25 MB by default
- Audio is sent to OpenAI for transcription.
- Audio and transcripts are not stored by this app by default.
- OpenAI calls run server-side. The browser sends WebRTC SDP offers to the server, and the server exchanges them with OpenAI.
- Logs exclude raw audio, full transcripts, tokens, keys, and secrets.
- The API includes request size checks, MIME/extension validation, rate limiting, timeouts, structured errors, and a healthcheck.
- Docker builds exclude local
.env*files through.dockerignore; pass secrets only at runtime.
- Build with
npm ci && npm run build. - Set
OPENAI_API_KEYin the host secret manager. - Run
npm run verify:openai && npm start;npm startvalidatesOPENAI_API_KEYbefore serving. - Ensure the host supports outbound HTTPS to OpenAI.
- Add
OPENAI_API_KEYas a server-side environment variable. - Keep the key out of
NEXT_PUBLIC_*variables. - Deploy normally with the Next.js adapter.
- Confirm
/api/healthreturnsopenaiConfigured: true.
Realtime WebRTC requires browser microphone permission and network access to https://api.openai.com.
- Missing key: API routes return a structured
MISSING_API_KEYerror and production startup validation fails safely. - Key present but transcription fails: run
npm run verify:openai. A key can be present while the project still lacks active billing, model allowlist access, or permission to use the configured STT models. - Upload route smoke test: with the dev or production server running, run
npm run smoke:upload. It sends a tiny generated WAV through/api/transcribeand reports sanitized status/error metadata. - Unsupported file: check file type, extension, and size.
- Realtime fails before connecting: check that the OpenAI project has active billing and access to
gpt-realtime-whisper. Local recording fallback is used only for browser/network microphone failures, not server-side OpenAI access failures. - OpenAI HTTP 500 during realtime setup: confirm the project has active billing and access to the configured realtime transcription model. A configured key is not enough if the project is not active for paid API usage.
- Long files: shorten or compress the file, then retry.
- Upstream failures: retry later or switch file mode to
gpt-4o-transcribefor accuracy-sensitive jobs.