Code RAG System

Ask questions about your codebase in plain English. Get answers with exact file and line references.

Two ways to use:

🖥️ Web UI — Dark-themed dashboard in your browser
⌨️ CLI — Command-line scripts for automation

What This Does

┌─────────────────────────────────────────────────────────────────┐
│  1. Point it at your code folder                                │
│  2. It reads and indexes all files                              │
│  3. Ask questions like "How does login work?"                   │
│  4. Get answers with exact file paths + line numbers            │
└─────────────────────────────────────────────────────────────────┘

Example Output:

┌──────────────────────────────────────────────────────────────┐
│ ANSWER                                                       │
├──────────────────────────────────────────────────────────────┤
│ User authentication is handled in src/auth.py.              │
│ The authenticate_user() function (lines 24-56) validates    │
│ credentials using bcrypt...                                  │
├──────────────────────────────────────────────────────────────┤
│ SOURCES                                                      │
├────────────────────────┬─────────────┬───────────────────────┤
│ File                   │ Lines       │ Symbol                │
├────────────────────────┼─────────────┼───────────────────────┤
│ src/auth.py            │ 24-56       │ authenticate_user     │
│ src/auth.py            │ 58-72       │ create_token          │
│ src/models/user.py     │ 1-35        │ User                  │
│ src/middleware.py      │ 10-28       │ require_auth          │
└────────────────────────┴─────────────┴───────────────────────┘

Prerequisites

Before starting, make sure you have:

Requirement	How to Check	How to Install
Docker	`docker --version`	Install Docker
Docker Compose	`docker-compose --version`	Included with Docker Desktop
Python 3.10+	`python3 --version`	Install Python
API Key	You have an OpenAI or Anthropic account	OpenAI or Anthropic

Setup (One-Time)

Step 1: Download the Project

git clone https://github.com/Drepheus/RAG-codebase.git

cd RAG-codebase

Step 2: Set Your API Key

You have two options. Choose one:

Option A: Using a .env file (Recommended)

cp .env.example .env

Open .env in a text editor (Notepad, VS Code, nano, etc.) and set your key:

For OpenAI:

LLM_PROVIDER=openai
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxxxxxxxxxx

For Anthropic (Claude):

LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxxxxxxxxxxxxxxxxxx

Option B: Using Environment Variables

Linux / macOS (Terminal):

export LLM_PROVIDER="openai"
export OPENAI_API_KEY="sk-proj-xxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Windows (PowerShell):

$env:LLM_PROVIDER = "openai"
$env:OPENAI_API_KEY = "sk-proj-xxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Windows (Command Prompt):

set LLM_PROVIDER=openai
set OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxxxxxxxxxx

Note: Environment variables are temporary (lost when you close the terminal). The .env file is permanent.

Step 3: Install Python Packages

pip install -r requirements.txt

Step 4: Start the Database

docker-compose up -d

⏳ Wait 60 seconds for the AI model to download and load (first time only).

Check if it's ready:

curl http://localhost:8080/v1/.well-known/ready

Expected response:

{"status":"READY"}

Step 5: Make Scripts Executable

Linux / macOS:

chmod +x scripts/ingest.sh scripts/query.sh scripts/ui.sh

Windows: No action needed. Use python -m src.ingest, python -m src.query, and python -m uvicorn src.app:app instead.

Usage

Choose your interface:

Mode	Command	Best For
Web UI	`./scripts/ui.sh`	Interactive exploration, visual feedback
CLI	`./scripts/query.sh "question"`	Automation, scripting, quick queries

Option 1: Web UI (Recommended for Beginners)

Start the web dashboard:

./scripts/ui.sh

Windows (PowerShell):

python -m uvicorn src.app:app --host 0.0.0.0 --port 8000

Then open http://localhost:8000 in your browser.

Features:

🌙 Dark theme interface
📊 Live status (connection, indexed repos, chunk count)
🔍 Query with example prompts
📁 Ingest codebases via form
📋 Results with clickable sources

Option 2: Command Line

Ingest a Codebase

./scripts/ingest.sh /full/path/to/your/code

Copy-paste examples:

# Index a project in your home folder
./scripts/ingest.sh /home/username/projects/my-app

# Index with a custom name
./scripts/ingest.sh /home/username/projects/my-app my-app

# Index the current folder
./scripts/ingest.sh $(pwd)

Windows (PowerShell):

python -m src.ingest C:\Users\YourName\projects\my-app

Sample output:

Ingesting repository: my-app
Path: /home/user/projects/my-app

Connecting to Weaviate...
✓ CodeChunk collection already exists
✓ Deleted 0 existing chunks for repo: my-app
  ✓ src/main.py (3 chunks)
  ✓ src/auth.py (2 chunks)
  ✓ src/api/routes.py (5 chunks)

✓ Ingestion complete!
  Files processed: 3
  Chunks created: 10

Ask Questions

./scripts/query.sh "your question here"

Copy-paste examples:

./scripts/query.sh "How does user authentication work?"

./scripts/query.sh "What does the User class do?"

./scripts/query.sh "Explain the database connection logic"

./scripts/query.sh "How are API routes organized?"

./scripts/query.sh "Where is error handling implemented?"

Windows (PowerShell):

python -m src.query "How does user authentication work?"

Sample output:

Query: How does user authentication work?

Searching for relevant code...
Found 8 relevant chunks

Calling OPENAI...
============================================================
ANSWER
============================================================
User authentication is implemented in `src/auth.py`. The main function
`authenticate_user()` (lines 24-56) accepts a username and password,
hashes the password using bcrypt, and compares it against the stored
hash in the database...

============================================================
SOURCES
============================================================
  • src/auth.py (lines 24-56) [authenticate_user]
  • src/auth.py (lines 58-72) [create_token]
  • src/models/user.py (lines 1-35) [User]
  • src/middleware.py (lines 10-28) [require_auth]

Quick Reference

Task	Command
Start database	`docker-compose up -d`
Stop database	`docker-compose down`
View database logs	`docker-compose logs -f`
Start Web UI	`./scripts/ui.sh` → open http://localhost:8000
Ingest code (CLI)	`./scripts/ingest.sh /path/to/code`
Ask a question (CLI)	`./scripts/query.sh "your question"`

✅ Safe to Re-Run (Idempotent)

Both scripts are safe to run multiple times:

Script	What Happens on Re-Run
`ingest.sh`	Deletes old chunks for that repo, then re-indexes. Other repos are untouched.
`query.sh`	Just asks a new question. No side effects.

You cannot break anything by running these commands multiple times.

Switching Between OpenAI and Anthropic

You can switch LLM providers at any time by editing your .env file:

To use OpenAI:

LLM_PROVIDER=openai
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxxxxxxxxxx

To use Anthropic (Claude):

LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxxxxxxxxxxxxxxxxxx

No need to restart anything. The change takes effect on the next query.

Available Models (set in .env):

Provider	Default Model	Alternatives
OpenAI	`gpt-4o-mini`	`gpt-4o`, `gpt-4-turbo`, `gpt-3.5-turbo`
Anthropic	`claude-3-haiku-20240307`	`claude-3-5-sonnet-20241022`, `claude-3-opus-20240229`

To change the model, add to .env:

OPENAI_MODEL=gpt-4o

or

ANTHROPIC_MODEL=claude-3-5-sonnet-20241022

Troubleshooting

❌ "Connection refused" or "Weaviate is not running"

Problem: The database isn't running.

Solution:

docker-compose up -d

Wait 60 seconds, then try again.

❌ "OPENAI_API_KEY not set" or "ANTHROPIC_API_KEY not set"

Problem: Your API key is missing or not loaded.

Solution:

Check that .env file exists:
```
ls -la .env
```
Check that it contains your key:
```
cat .env
```
Verify the key format:
- OpenAI keys start with: sk-proj- or sk-
- Anthropic keys start with: sk-ant-
If using environment variables instead of .env, make sure you exported them in the same terminal session:
```
echo $OPENAI_API_KEY
```

❌ "No relevant code found"

Problem: The codebase hasn't been indexed yet.

Solution:

./scripts/ingest.sh /path/to/your/code

❌ Database won't start / out of memory

Problem: The embedding model needs ~2GB RAM.

Solution:

Check available memory:

free -h

If low on memory, close other applications or increase VM memory.

❌ Scripts don't run on Windows

Problem: .sh scripts are for Linux/macOS.

Solution: Use Python directly:

python -m src.ingest C:\path\to\code
python -m src.query "your question"

🔄 Re-index a codebase

Just run ingest again. It's safe and replaces old data automatically:

./scripts/ingest.sh /path/to/code

🔍 View database status

# Check if containers are running
docker ps

# Check Weaviate logs
docker-compose logs weaviate

# Check embedding model logs
docker-compose logs t2v-transformers

🗑️ Reset everything (delete all data)

docker-compose down -v
docker-compose up -d

Wait 60 seconds for restart.

Multiple Repositories

You can index multiple codebases. Each one is stored separately:

# Index first project
./scripts/ingest.sh /path/to/project-a project-a

# Index second project
./scripts/ingest.sh /path/to/project-b project-b

Queries search across all indexed repositories.

To re-index just one project, run ingest again with the same name. Only that project's chunks are replaced.

Project Structure

.
├── docker-compose.yml      # Database configuration
├── requirements.txt        # Python dependencies
├── .env.example           # Template for your API key
├── .env                   # Your actual API key (not in git)
├── scripts/
│   ├── ingest.sh          # Ingest command wrapper
│   └── query.sh           # Query command wrapper
└── src/
    ├── config.py          # Settings
    ├── schema.py          # Database schema
    ├── chunker.py         # Code splitting logic
    ├── ingest.py          # Ingestion logic
    └── query.py           # Query logic

How It Works

Ingestion: Reads all code files, splits them into chunks (by function/class when possible), and stores them in Weaviate with metadata (file path, line numbers, language).
Query: Converts your question into a vector, finds the 8 most similar code chunks, sends them to the LLM with your question, and returns the answer with sources.

Supported Languages

Python, JavaScript, TypeScript, Java, Go, Rust, C, C++, C#, Ruby, PHP, Swift, Kotlin, Scala, Vue, Svelte, SQL, Shell, YAML, JSON, TOML, Markdown, HTML, CSS, SCSS

Configuration

Edit src/config.py to customize:

Setting	Default	Description
`MAX_LINES`	160	Maximum lines per chunk
`OVERLAP_LINES`	30	Overlap between chunks
`TOP_K_RESULTS`	8	Number of chunks to retrieve

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
scripts		scripts
src		src
static		static
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Code RAG System

What This Does

Prerequisites

Setup (One-Time)

Step 1: Download the Project

Step 2: Set Your API Key

Option A: Using a .env file (Recommended)

Option B: Using Environment Variables

Step 3: Install Python Packages

Step 4: Start the Database

Step 5: Make Scripts Executable

Usage

Option 1: Web UI (Recommended for Beginners)

Option 2: Command Line

Ingest a Codebase

Ask Questions

Quick Reference

✅ Safe to Re-Run (Idempotent)

Switching Between OpenAI and Anthropic

Troubleshooting

❌ "Connection refused" or "Weaviate is not running"

❌ "OPENAI_API_KEY not set" or "ANTHROPIC_API_KEY not set"

❌ "No relevant code found"

❌ Database won't start / out of memory

❌ Scripts don't run on Windows

🔄 Re-index a codebase

🔍 View database status

🗑️ Reset everything (delete all data)

Multiple Repositories

Project Structure

How It Works

Supported Languages

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages