Skip to content

surefire01/voice-assistant-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🗣️ Voice Assistant Starter Template

Python License

A simplified, general-purpose voice assistant backend you can use as a starting point to build your own custom voice-driven applications.

This project lets you ask questions in voice and get AI-generated audio replies, demonstrating streaming audio processing, speech-to-text, LLM-powered responses, and text-to-speech output.

🎥 Demo

demo.mp4

✨ Features

  • Accepts voice input and generates AI audio replies
  • Streams audio for low latency
  • Modular FastAPI backend
  • Easy to extend for your own use case

🎯 Use Cases

  • Personal voice assistants
  • Customer support bots
  • Interactive voice apps
  • Smart home interfaces

🛠️ Quick Setup

⚠️ Requires Python 3.11 only

1️⃣ Clone the Repository

git clone https://github.com/your-username/your-repo-name.git
cd your-repo-name

2️⃣ Verify Python Version

Make sure you're using Python 3.11:

python --version
# Should output: Python 3.11.x

If not, install it from python.org.


3️⃣ Create a Virtual Environment

python -m venv venv
source venv/bin/activate   # macOS/Linux
venv\Scripts\activate      # Windows

4️⃣ Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

5️⃣ Install ffmpeg

This project needs ffmpeg for audio processing:

  • macOS

    brew install ffmpeg
  • Ubuntu/Linux

    sudo apt update
    sudo apt install ffmpeg
  • Windows

    Download from FFmpeg.org and add it to your PATH.


6️⃣ Create Environment File

Get your Mistral API key by signing up and following their API Quickstart Guide.

Create a .env file:

API_KEY=your_mistral_api_key_here

Sure—here’s a clean snippet you can drop right into your README under the Setup section (e.g., after Install ffmpeg) or as a new Install eSpeak step:


7️⃣ Install eSpeak

This project uses eSpeak to enable additional Coqui TTS voice models.

💻 Windows

  1. Download and install eSpeak for Windows: 👉 https://espeak.sourceforge.net/
  2. After installation, add the espeak/command-line folder to your PATH environment variable so the espeak command is available in the terminal.

🐧 Linux (Ubuntu/Debian)

sudo apt update
sudo apt install espeak

🍎 macOS

brew install espeak

▶️ Running the Project

uvicorn src.main:app --reload --host 0.0.0.0 --port 8000

Access it in your browser or API client at:

http://localhost:8000

📣 Example Interaction

  1. Visit the /ui endpoint in your browser:
http://localhost:8000/ui
  1. Use the web interface to record your question.
  2. Hear an AI-generated audio reply instantly.

🏗️ Architecture

  • Gradio web UI streams audio to the FastAPI backend.
  • Voice Activity Detection (VAD) finds when speech starts and stops.
  • Detected speech is transcribed to text.
  • Text is sent to the LLM (Mistral) to generate a response.
  • The response is converted to speech with TTS.
  • Audio reply streams back to the user in real time.

🧩 Next Steps

  • Swap in your preferred LLM
  • Customize prompts or dialogue logic
  • Add authentication and logging
  • Containerize with Docker
  • Deploy on AWS Lambda, ECS, etc.

🙏 Credits

Built to help anyone bootstrap their own voice assistant with clean, minimal code.


📜 License

This project is licensed under the MIT License. See the LICENSE file for details.


📫 Questions?

Open an issue or reach out on LinkedIn.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages