Skip to content

An intelligent WhatsApp bot powered by local LLMs (LM Studio) and RAG context retrieval. It scans 500 messages of history to generate deeply context-aware responses offline—no API costs. Built on whatsapp-web.js, it features typing indicators, debouncing, and smart filters for natural, seamless automation.

Notifications You must be signed in to change notification settings

Anmoldureha/whatsapp-bot

Repository files navigation

WhatsApp Auto-Responder with RAG

A WhatsApp bot that runs entirely on your machine using local LLMs. It reads your message history, finds relevant context from past conversations, and generates natural responses that actually sound like you.

What it does

This bot connects to WhatsApp Web and automatically replies to messages using a language model running on your computer. The interesting part is the RAG (Retrieval-Augmented Generation) system - it doesn't just look at the last few messages. It searches through up to 500 previous messages to find relevant context, then combines that with your recent conversation to generate contextually aware responses.

So if someone asks "what was that restaurant you mentioned last month?" - the bot can actually find and reference that conversation.

Features

Local LLM (no API costs)

  • Runs on LM Studio with any model you want
  • Default setup uses qwen2.5-7b-instruct
  • Everything happens on your machine - no data leaves your computer
  • No OpenAI bills, no API rate limits

RAG System

  • Fetches up to 500 messages of conversation history
  • Uses keyword-based scoring to find relevant past messages
  • Combines immediate context (last 20 messages) with retrieved relevant history
  • The LLM sees both recent conversation flow and relevant info from weeks/months ago

WhatsApp Integration

  • Uses whatsapp-web.js library
  • QR code authentication - scan once, then it reconnects automatically
  • Session stored locally so you don't re-authenticate every time
  • Works with individual chats (groups are ignored by default)
  • Shows typing indicator before replying

Smart Response Handling

  • Debounces messages (waits 5 seconds to handle double texts)
  • Detects and prevents duplicate responses
  • Strips out hallucinations and random characters
  • Adds configurable reply delays to feel more natural
  • Prevents the bot from claiming ownership of links you didn't send
  • Adds a ~ prefix to all responses so people know it's automated

How it works

  1. Authentication: When you first run it, you scan a QR code with your phone. That's it. Session gets saved locally.

  2. Message arrives: Bot detects incoming message in a monitored chat.

  3. RAG retrieval:

    • Grabs 500 messages from chat history
    • Splits into "immediate context" (last 20) and "deep history" (older 480)
    • Searches deep history for messages relevant to the current conversation
    • Scores based on keyword matching with word boundaries
  4. Context assembly: Builds a prompt like this:

    [System prompt]
    --- RELEVANT PAST CONTEXT ---
    [Top 10 relevant messages from deep history]
    --- END RELEVANT PAST CONTEXT ---
    --- CURRENT CONVERSATION ---
    [Last 20 messages]
    
  5. LLM generates response: Sends to your local LM Studio instance.

  6. Post-processing: Cleans the response, checks for duplicates, adds prefix, sends to WhatsApp.

Setup

Prerequisites

  1. Node.js (v16 or higher)
  2. LM Studio - Download from lmstudio.ai
    • Load a model (I'm using qwen2.5-7b-instruct)
    • Start the local server (usually runs on http://localhost:1234)

Installation

# Clone the repo
git clone https://github.com/Anmoldureha/whatsapp-bot.git
cd whatsapp-bot

# Install dependencies
npm install

# Create environment file
cp .env.example .env

Configuration

Edit .env to customize:

# LM Studio settings
LM_STUDIO_URL=http://localhost:1234/v1
LM_STUDIO_MODEL=qwen2.5-7b-instruct

# Bot behavior
SYSTEM_PROMPT=You are a helpful assistant responding to WhatsApp messages.
AUTO_REPLY_ENABLED=true
CONTEXT_MESSAGE_LIMIT=20
REPLY_DELAY_MS=2000

# Chat filters (optional)
MONITORED_CHATS=         # Leave empty to monitor all chats
IGNORED_CHATS=           # Comma-separated chat IDs to ignore

Chat ID format: Use the format shown in logs when messages arrive (usually like 1234567890@c.us)

Running

npm start

First run:

  1. QR code appears in terminal
  2. Open WhatsApp on your phone
  3. Go to Settings > Linked Devices > Link a Device
  4. Scan the QR code
  5. Bot connects and starts monitoring

Subsequent runs just reconnect using the saved session.

How the RAG works

The keyword-based retrieval is simple but works well:

  1. Takes the last message from the user
  2. Tokenizes it (splits into words, filters out words under 5 characters)
  3. For each message in deep history:
    • Check if any query keywords appear (using word boundaries)
    • Add +1 to score for each match
    • Subtract 0.5 if message is very short (likely "ok" or "lol")
  4. Sort by score, take top 10 messages
  5. Add these to the prompt as "relevant past context"

This means if someone says "remember that Python bug from last week?", the bot searches for messages containing "python" and "remember" and feeds those to the LLM.

Response safety features

The bot has several guardrails to prevent awkward responses:

Duplicate detection: Checks last 5 bot messages for exact matches. Won't send the same thing twice.

Thanks loop prevention: If the bot already said "thanks" recently, it won't say it again (prevents infinite politeness loops).

Link ownership check: If the other person sent a link, the bot won't say "check it out" or "what's your take?" (which would imply the bot sent the link).

Hallucination cleaning: Strips out Chinese characters and other common model artifacts.

Monitoring and filtering

By default, the bot monitors all individual chats and ignores groups.

To monitor specific chats only:

MONITORED_CHATS=1234567890@c.us,0987654321@c.us

To ignore specific chats:

IGNORED_CHATS=annoying-person@c.us

To disable auto-reply but still see messages:

AUTO_REPLY_ENABLED=false

Project structure

├── index.js              # Main bot logic, RAG implementation, message handling
├── list_models.js        # Utility to list available Gemini models (legacy)
├── .env.example          # Example environment variables
├── .wwebjs_auth/         # WhatsApp session storage (auto-generated)
├── .wwebjs_cache/        # WhatsApp cache (auto-generated)
└── package.json          # Dependencies

Why local LLMs?

Running locally means:

  • No API costs (run it 24/7 for free)
  • Your messages never leave your computer
  • No rate limits
  • Works offline once the model is downloaded
  • You can swap models anytime (try different personalities)

The tradeoff is you need a decent computer. 7B models run fine on M1 Macs or modern gaming PCs. Larger models need more RAM.

Common issues

QR code won't scan: Make sure you're using WhatsApp on your phone (not another device that's already linked).

Bot doesn't reply: Check that AUTO_REPLY_ENABLED=true in your .env file and that LM Studio server is running.

"Connection refused" error: LM Studio isn't running or is on a different port. Check the server is started and the URL in .env matches.

Duplicate messages: This can happen if you restart the bot mid-conversation. The debounce system should handle it after a few seconds.

Technical notes

Message debouncing: When a message arrives, the bot waits 5 seconds before processing. If more messages arrive during that time, it resets the timer. This handles people who send multiple short messages quickly.

Context limits: The last 20 messages are always included. Retrieved messages (up to 10) come from the older history. Total context sent to LLM is roughly 30-35 messages, depending on what's retrieved.

Model compatibility: Any model that works in LM Studio should work here. I've tested with Qwen, Llama, and Mistral models. Smaller models (7B) are faster but less coherent. Larger models (13B+) are better but slower.

Future ideas

  • Add embeddings-based retrieval (proper semantic search instead of keyword matching)
  • Support for image messages
  • Voice message transcription
  • Better duplicate detection (semantic similarity instead of exact match)
  • Per-chat personality customization
  • Message scheduling

License

MIT

Acknowledgments

Built with:

About

An intelligent WhatsApp bot powered by local LLMs (LM Studio) and RAG context retrieval. It scans 500 messages of history to generate deeply context-aware responses offline—no API costs. Built on whatsapp-web.js, it features typing indicators, debouncing, and smart filters for natural, seamless automation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •