A WhatsApp bot that runs entirely on your machine using local LLMs. It reads your message history, finds relevant context from past conversations, and generates natural responses that actually sound like you.
This bot connects to WhatsApp Web and automatically replies to messages using a language model running on your computer. The interesting part is the RAG (Retrieval-Augmented Generation) system - it doesn't just look at the last few messages. It searches through up to 500 previous messages to find relevant context, then combines that with your recent conversation to generate contextually aware responses.
So if someone asks "what was that restaurant you mentioned last month?" - the bot can actually find and reference that conversation.
- Runs on LM Studio with any model you want
- Default setup uses
qwen2.5-7b-instruct - Everything happens on your machine - no data leaves your computer
- No OpenAI bills, no API rate limits
- Fetches up to 500 messages of conversation history
- Uses keyword-based scoring to find relevant past messages
- Combines immediate context (last 20 messages) with retrieved relevant history
- The LLM sees both recent conversation flow and relevant info from weeks/months ago
- Uses
whatsapp-web.jslibrary - QR code authentication - scan once, then it reconnects automatically
- Session stored locally so you don't re-authenticate every time
- Works with individual chats (groups are ignored by default)
- Shows typing indicator before replying
- Debounces messages (waits 5 seconds to handle double texts)
- Detects and prevents duplicate responses
- Strips out hallucinations and random characters
- Adds configurable reply delays to feel more natural
- Prevents the bot from claiming ownership of links you didn't send
- Adds a
~prefix to all responses so people know it's automated
-
Authentication: When you first run it, you scan a QR code with your phone. That's it. Session gets saved locally.
-
Message arrives: Bot detects incoming message in a monitored chat.
-
RAG retrieval:
- Grabs 500 messages from chat history
- Splits into "immediate context" (last 20) and "deep history" (older 480)
- Searches deep history for messages relevant to the current conversation
- Scores based on keyword matching with word boundaries
-
Context assembly: Builds a prompt like this:
[System prompt] --- RELEVANT PAST CONTEXT --- [Top 10 relevant messages from deep history] --- END RELEVANT PAST CONTEXT --- --- CURRENT CONVERSATION --- [Last 20 messages] -
LLM generates response: Sends to your local LM Studio instance.
-
Post-processing: Cleans the response, checks for duplicates, adds prefix, sends to WhatsApp.
- Node.js (v16 or higher)
- LM Studio - Download from lmstudio.ai
- Load a model (I'm using qwen2.5-7b-instruct)
- Start the local server (usually runs on
http://localhost:1234)
# Clone the repo
git clone https://github.com/Anmoldureha/whatsapp-bot.git
cd whatsapp-bot
# Install dependencies
npm install
# Create environment file
cp .env.example .envEdit .env to customize:
# LM Studio settings
LM_STUDIO_URL=http://localhost:1234/v1
LM_STUDIO_MODEL=qwen2.5-7b-instruct
# Bot behavior
SYSTEM_PROMPT=You are a helpful assistant responding to WhatsApp messages.
AUTO_REPLY_ENABLED=true
CONTEXT_MESSAGE_LIMIT=20
REPLY_DELAY_MS=2000
# Chat filters (optional)
MONITORED_CHATS= # Leave empty to monitor all chats
IGNORED_CHATS= # Comma-separated chat IDs to ignoreChat ID format: Use the format shown in logs when messages arrive (usually like 1234567890@c.us)
npm startFirst run:
- QR code appears in terminal
- Open WhatsApp on your phone
- Go to Settings > Linked Devices > Link a Device
- Scan the QR code
- Bot connects and starts monitoring
Subsequent runs just reconnect using the saved session.
The keyword-based retrieval is simple but works well:
- Takes the last message from the user
- Tokenizes it (splits into words, filters out words under 5 characters)
- For each message in deep history:
- Check if any query keywords appear (using word boundaries)
- Add +1 to score for each match
- Subtract 0.5 if message is very short (likely "ok" or "lol")
- Sort by score, take top 10 messages
- Add these to the prompt as "relevant past context"
This means if someone says "remember that Python bug from last week?", the bot searches for messages containing "python" and "remember" and feeds those to the LLM.
The bot has several guardrails to prevent awkward responses:
Duplicate detection: Checks last 5 bot messages for exact matches. Won't send the same thing twice.
Thanks loop prevention: If the bot already said "thanks" recently, it won't say it again (prevents infinite politeness loops).
Link ownership check: If the other person sent a link, the bot won't say "check it out" or "what's your take?" (which would imply the bot sent the link).
Hallucination cleaning: Strips out Chinese characters and other common model artifacts.
By default, the bot monitors all individual chats and ignores groups.
To monitor specific chats only:
MONITORED_CHATS=1234567890@c.us,0987654321@c.usTo ignore specific chats:
IGNORED_CHATS=annoying-person@c.usTo disable auto-reply but still see messages:
AUTO_REPLY_ENABLED=false├── index.js # Main bot logic, RAG implementation, message handling
├── list_models.js # Utility to list available Gemini models (legacy)
├── .env.example # Example environment variables
├── .wwebjs_auth/ # WhatsApp session storage (auto-generated)
├── .wwebjs_cache/ # WhatsApp cache (auto-generated)
└── package.json # Dependencies
Running locally means:
- No API costs (run it 24/7 for free)
- Your messages never leave your computer
- No rate limits
- Works offline once the model is downloaded
- You can swap models anytime (try different personalities)
The tradeoff is you need a decent computer. 7B models run fine on M1 Macs or modern gaming PCs. Larger models need more RAM.
QR code won't scan: Make sure you're using WhatsApp on your phone (not another device that's already linked).
Bot doesn't reply: Check that AUTO_REPLY_ENABLED=true in your .env file and that LM Studio server is running.
"Connection refused" error: LM Studio isn't running or is on a different port. Check the server is started and the URL in .env matches.
Duplicate messages: This can happen if you restart the bot mid-conversation. The debounce system should handle it after a few seconds.
Message debouncing: When a message arrives, the bot waits 5 seconds before processing. If more messages arrive during that time, it resets the timer. This handles people who send multiple short messages quickly.
Context limits: The last 20 messages are always included. Retrieved messages (up to 10) come from the older history. Total context sent to LLM is roughly 30-35 messages, depending on what's retrieved.
Model compatibility: Any model that works in LM Studio should work here. I've tested with Qwen, Llama, and Mistral models. Smaller models (7B) are faster but less coherent. Larger models (13B+) are better but slower.
- Add embeddings-based retrieval (proper semantic search instead of keyword matching)
- Support for image messages
- Voice message transcription
- Better duplicate detection (semantic similarity instead of exact match)
- Per-chat personality customization
- Message scheduling
MIT
Built with:
- whatsapp-web.js for WhatsApp integration
- LM Studio for local LLM hosting
- OpenAI SDK for API compatibility