An advanced, real-time voice assistant integrated into Discord. This bot combines Speech-to-Text (STT), Large Language Models (LLM), and Text-to-Speech (TTS) to engage in natural, meaningful conversations with users in voice channels.
It features tool usage (Web Search, Time), silence detection, and dual-language support (English/Polish).
- ⚡ Real-time Transcription (STT):
- Supports Local inference using
Faster-Whisper(CUDA recommended). - Supports Cloud inference using Groq API (Whisper V3) for ultra-low latency.
- Supports Local inference using
- 🧠 Intelligent Responses (LLM):
- Powered by Llama 3 70B via Groq API.
- Context-aware conversations with memory management.
- 🗣️ Natural Voice (TTS):
- High-quality voice synthesis using
edge-tts(Microsoft Azure Neural voices).
- High-quality voice synthesis using
- 🛠️ Autonomous Tools:
- Web Search: Fetches real-time data (news, weather, facts) using Tavily API.
- Time Check: Provides accurate local time and date.
- 🎙️ Smart Audio Handling:
- VAD (Voice Activity Detection): Automatically detects silence to process speech.
- Wake Words: Configurable trigger words (e.g., "Jarvis", "Garmin") to activate the bot (optional).
- Python 3.10+
- FFmpeg: Essential for audio processing. Must be installed and added to your system's PATH.
- Windows:
winget install ffmpeg - Linux:
sudo apt install ffmpeg
- Windows:
- API Keys: You will need keys for Discord, Groq, and Tavily.
-
Clone the repository:
git clone https://github.com/Inexpli/Discord-Jarvis cd Discord-Jarvis -
Install dependencies: It is recommended to use a virtual environment.
pip install -r requirements.txt
-
Configuration (.env): Create a
.envfile in the root directory and add your credentials:BOT_TOKEN=your_discord_bot_token GUILD_ID=your_server_id GROQ_API_KEY=your_groq_api_key TAVILY_API_KEY=your_tavily_api_key HF_TOKEN=optional_huggingface_token
-
Customize Settings: Open
config.pyto adjust:LANGUAGE: Set to"en"for English or"pl"for Polish.RUN_LOCALLY: Set toTrueto use local GPU resources, orFalseto use Groq API.TRIGGERS: Add or remove wake words.
- Start the bot:
python main.py
- Discord Commands:
/join: The bot joins your current voice channel and starts listening./stop: The bot leaves the channel.
- Interaction:
- If
REQUIRE_TRIGGER = True, start your sentence with "Jarvis" (or other configured triggers). - If
REQUIRE_TRIGGER = False, the bot will respond to all speech detected. - The bot will listen, process your request, and reply via voice.
- If
main.py: Core logic, Discord event handling, audio processing pipeline, and LLM integration.config.py: Configuration parameters, prompt templates, and environment variable loading..env: storage for sensitive API keys (excluded from version control).
Ensure your requirements.txt includes the following libraries:
py-cord
faster-whisper
python-dotenv
groq
tavily-python
edge-tts
numpy
This project is licensed under the MIT License.
Single Instance: The bot currently utilizes global variables for conversation state. It is designed to work on one server/channel at a time. Running it on multiple servers simultaneously may cause conversation history overlap.
Local Performance: If RUN_LOCALLY = True, a decent GPU (NVIDIA) is required for Faster-Whisper to run smoothly.