Goal: Evaluate Gemini’s voice/audio capabilities by sending an audio sample, analyzing the model output, and returning results in a structured format that matches our requirements.
Requirements
- Provide an audio/voice input to Gemini (recorded file or microphone stream).
- Convert audio to text (if needed) and/or use Gemini’s native audio input (depending on the API mode).
- Ask Gemini to analyze the voice input and return:
- transcription
- key information extraction (what we need)
- summary
- optional metrics (confidence, timestamps, language, etc.)
- Output must be structured (JSON) so it can be consumed by the backend.
Inputs
Goal: Evaluate Gemini’s voice/audio capabilities by sending an audio sample, analyzing the model output, and returning results in a structured format that matches our requirements.
Requirements
Inputs