Skip to content

Voice import: spoken audio → recipe (Whisper → text extraction)#85

Merged
windoze95 merged 1 commit into
mainfrom
feat/voice-import
Jun 30, 2026
Merged

Voice import: spoken audio → recipe (Whisper → text extraction)#85
windoze95 merged 1 commit into
mainfrom
feat/voice-import

Conversation

@windoze95

Copy link
Copy Markdown
Owner

What

Phase 2 of unified multi-source importPOST /v1/recipes/import/voice. Upload spoken audio (a recipe read or described aloud); it's transcribed via the existing Whisper provider and run through the same text-extraction path, then saved. Voice transcription was previously wired only to cook mode — this reuses it for capture.

How

  • serviceImportFromVoice(audio, format) = TranscribeAudioExtractRecipeFromTextcreateImportedRecipe (mirrors ImportFromText). New settable SpeechProvider field on ImportService, set in the router from the Whisper provider already built for cook mode.
  • handler + router — multipart audio field (+ optional format), 25 MB cap (Whisper's file limit), ExtractionError codes → 422. Format defaults from the filename extension; the provider's audioFilePath already normalizes/validates it.

Tests

ImportFromVoice service (success, empty transcript, transcription error) + multipart handler (success, missing audio). go test ./... -count=1 → green (11 packages, 0 failures).

Follow-ups (not in this PR)

  • Phase 3: Flutter unified "Add recipes" picker + multi-preview screen, surfacing files + voice alongside the existing options (and an optional no-save preview endpoint).
  • Still open: the social/shareable recipe-lineage sharing direction.

🤖 Generated with Claude Code

https://claude.ai/code/session_01BU4UWZutHd1AnK3XAf7H19

New POST /v1/recipes/import/voice accepts an audio file, transcribes it
via the existing Whisper provider, and extracts a recipe from the
transcript -- reusing the text-extraction path. Phase 2 of unified
multi-source import.

- service: ImportFromVoice (transcribe -> ExtractRecipeFromText -> save);
  new settable SpeechProvider field on ImportService.
- handler/router: multipart endpoint (audio field, optional format),
  25MB Whisper limit; wired to the Whisper provider already constructed
  for cook mode.

Tests: ImportFromVoice (success/empty-transcript/transcription-error) +
multipart handler (success/missing-audio). Full suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BU4UWZutHd1AnK3XAf7H19
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@windoze95 windoze95 merged commit f5363e6 into main Jun 30, 2026
1 check passed
@windoze95 windoze95 deleted the feat/voice-import branch June 30, 2026 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant