WhatsApp Auto-Responder with RAG

A WhatsApp bot that runs entirely on your machine using local LLMs. It reads your message history, finds relevant context from past conversations, and generates natural responses that actually sound like you.

What it does

This bot connects to WhatsApp Web and automatically replies to messages using a language model running on your computer. The interesting part is the RAG (Retrieval-Augmented Generation) system - it doesn't just look at the last few messages. It searches through up to 500 previous messages to find relevant context, then combines that with your recent conversation to generate contextually aware responses.

So if someone asks "what was that restaurant you mentioned last month?" - the bot can actually find and reference that conversation.

Features

Local LLM (no API costs)

Runs on LM Studio with any model you want
Default setup uses qwen2.5-7b-instruct
Everything happens on your machine - no data leaves your computer
No OpenAI bills, no API rate limits

RAG System

Fetches up to 500 messages of conversation history
Uses keyword-based scoring to find relevant past messages
Combines immediate context (last 20 messages) with retrieved relevant history
The LLM sees both recent conversation flow and relevant info from weeks/months ago

WhatsApp Integration

Uses whatsapp-web.js library
QR code authentication - scan once, then it reconnects automatically
Session stored locally so you don't re-authenticate every time
Works with individual chats (groups are ignored by default)
Shows typing indicator before replying

Smart Response Handling

Debounces messages (waits 5 seconds to handle double texts)
Detects and prevents duplicate responses
Strips out hallucinations and random characters
Adds configurable reply delays to feel more natural
Prevents the bot from claiming ownership of links you didn't send
Adds a ~ prefix to all responses so people know it's automated

How it works

Authentication: When you first run it, you scan a QR code with your phone. That's it. Session gets saved locally.
Message arrives: Bot detects incoming message in a monitored chat.
RAG retrieval:
- Grabs 500 messages from chat history
- Splits into "immediate context" (last 20) and "deep history" (older 480)
- Searches deep history for messages relevant to the current conversation
- Scores based on keyword matching with word boundaries

Context assembly: Builds a prompt like this:

[System prompt]
--- RELEVANT PAST CONTEXT ---
[Top 10 relevant messages from deep history]
--- END RELEVANT PAST CONTEXT ---
--- CURRENT CONVERSATION ---
[Last 20 messages]

LLM generates response: Sends to your local LM Studio instance.
Post-processing: Cleans the response, checks for duplicates, adds prefix, sends to WhatsApp.

Setup

Prerequisites

Node.js (v16 or higher)
LM Studio - Download from lmstudio.ai
- Load a model (I'm using qwen2.5-7b-instruct)
- Start the local server (usually runs on http://localhost:1234)

Installation

# Clone the repo
git clone https://github.com/Anmoldureha/whatsapp-bot.git
cd whatsapp-bot

# Install dependencies
npm install

# Create environment file
cp .env.example .env

Configuration

Edit .env to customize:

# LM Studio settings
LM_STUDIO_URL=http://localhost:1234/v1
LM_STUDIO_MODEL=qwen2.5-7b-instruct

# Bot behavior
SYSTEM_PROMPT=You are a helpful assistant responding to WhatsApp messages.
AUTO_REPLY_ENABLED=true
CONTEXT_MESSAGE_LIMIT=20
REPLY_DELAY_MS=2000

# Chat filters (optional)
MONITORED_CHATS=         # Leave empty to monitor all chats
IGNORED_CHATS=           # Comma-separated chat IDs to ignore

Chat ID format: Use the format shown in logs when messages arrive (usually like 1234567890@c.us)

Running

npm start

First run:

QR code appears in terminal
Open WhatsApp on your phone
Go to Settings > Linked Devices > Link a Device
Scan the QR code
Bot connects and starts monitoring

Subsequent runs just reconnect using the saved session.

How the RAG works

The keyword-based retrieval is simple but works well:

Takes the last message from the user
Tokenizes it (splits into words, filters out words under 5 characters)
For each message in deep history:
- Check if any query keywords appear (using word boundaries)
- Add +1 to score for each match
- Subtract 0.5 if message is very short (likely "ok" or "lol")
Sort by score, take top 10 messages
Add these to the prompt as "relevant past context"

This means if someone says "remember that Python bug from last week?", the bot searches for messages containing "python" and "remember" and feeds those to the LLM.

Response safety features

The bot has several guardrails to prevent awkward responses:

Duplicate detection: Checks last 5 bot messages for exact matches. Won't send the same thing twice.

Thanks loop prevention: If the bot already said "thanks" recently, it won't say it again (prevents infinite politeness loops).

Link ownership check: If the other person sent a link, the bot won't say "check it out" or "what's your take?" (which would imply the bot sent the link).

Hallucination cleaning: Strips out Chinese characters and other common model artifacts.

Monitoring and filtering

By default, the bot monitors all individual chats and ignores groups.

To monitor specific chats only:

MONITORED_CHATS=1234567890@c.us,0987654321@c.us

To ignore specific chats:

IGNORED_CHATS=annoying-person@c.us

To disable auto-reply but still see messages:

AUTO_REPLY_ENABLED=false

Project structure

├── index.js              # Main bot logic, RAG implementation, message handling
├── list_models.js        # Utility to list available Gemini models (legacy)
├── .env.example          # Example environment variables
├── .wwebjs_auth/         # WhatsApp session storage (auto-generated)
├── .wwebjs_cache/        # WhatsApp cache (auto-generated)
└── package.json          # Dependencies

Why local LLMs?

Running locally means:

No API costs (run it 24/7 for free)
Your messages never leave your computer
No rate limits
Works offline once the model is downloaded
You can swap models anytime (try different personalities)

The tradeoff is you need a decent computer. 7B models run fine on M1 Macs or modern gaming PCs. Larger models need more RAM.

Common issues

QR code won't scan: Make sure you're using WhatsApp on your phone (not another device that's already linked).

Bot doesn't reply: Check that AUTO_REPLY_ENABLED=true in your .env file and that LM Studio server is running.

"Connection refused" error: LM Studio isn't running or is on a different port. Check the server is started and the URL in .env matches.

Duplicate messages: This can happen if you restart the bot mid-conversation. The debounce system should handle it after a few seconds.

Technical notes

Message debouncing: When a message arrives, the bot waits 5 seconds before processing. If more messages arrive during that time, it resets the timer. This handles people who send multiple short messages quickly.

Context limits: The last 20 messages are always included. Retrieved messages (up to 10) come from the older history. Total context sent to LLM is roughly 30-35 messages, depending on what's retrieved.

Model compatibility: Any model that works in LM Studio should work here. I've tested with Qwen, Llama, and Mistral models. Smaller models (7B) are faster but less coherent. Larger models (13B+) are better but slower.

Future ideas

Add embeddings-based retrieval (proper semantic search instead of keyword matching)
Support for image messages
Voice message transcription
Better duplicate detection (semantic similarity instead of exact match)
Per-chat personality customization
Message scheduling

License

MIT

Acknowledgments

Built with:

whatsapp-web.js for WhatsApp integration
LM Studio for local LLM hosting
OpenAI SDK for API compatibility

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
index.js		index.js
list_models.js		list_models.js
message_contact.js		message_contact.js
optimized-system-prompt.md		optimized-system-prompt.md
package-lock.json		package-lock.json
package.json		package.json
test_embedding.js		test_embedding.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhatsApp Auto-Responder with RAG

What it does

Features

Local LLM (no API costs)

RAG System

WhatsApp Integration

Smart Response Handling

How it works

Setup

Prerequisites

Installation

Configuration

Running

How the RAG works

Response safety features

Monitoring and filtering

Project structure

Why local LLMs?

Common issues

Technical notes

Future ideas

License

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Anmoldureha/whatsapp-bot

Folders and files

Latest commit

History

Repository files navigation

WhatsApp Auto-Responder with RAG

What it does

Features

Local LLM (no API costs)

RAG System

WhatsApp Integration

Smart Response Handling

How it works

Setup

Prerequisites

Installation

Configuration

Running

How the RAG works

Response safety features

Monitoring and filtering

Project structure

Why local LLMs?

Common issues

Technical notes

Future ideas

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages