Skip to content

Test Gemini with Voice Input + Analysis + Structured Output #11

@amjadAwad95

Description

@amjadAwad95

Goal: Evaluate Gemini’s voice/audio capabilities by sending an audio sample, analyzing the model output, and returning results in a structured format that matches our requirements.


Requirements

  • Provide an audio/voice input to Gemini (recorded file or microphone stream).
  • Convert audio to text (if needed) and/or use Gemini’s native audio input (depending on the API mode).
  • Ask Gemini to analyze the voice input and return:
    • transcription
    • key information extraction (what we need)
    • summary
    • optional metrics (confidence, timestamps, language, etc.)
  • Output must be structured (JSON) so it can be consumed by the backend.

Inputs

  • Audio file (wav/mp3/m4a) OR live microphone recording
  • Test prompt template (what we ask Gemini to do)
  • Expected output schema (JSON fields)

Metadata

Metadata

Labels

No labels
No labels

Projects

Status

Ready

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions