Pearl is a RAG (Retrieval-Augmented Generation) system built with Elixir and Phoenix. It generates comprehensive wikis from code repositories, allowing you to ask questions about any codebase using natural language.
This project was inspired by DeepWiki from Devin and created as a learning exercise to explore Elixir, Phoenix LiveView, and RAG architectures—starting with naive RAG and progressing through techniques from recent research papers.
Named after Pearl I. Young (1895–1968), the first female technical employee of NACA (which became NASA) and the second female physicist in the U.S. federal government. After earning degrees in physics, chemistry, and mathematics from the University of North Dakota in 1919, she joined NACA's Langley Laboratory in 1922 as a physicist calibrating flight instrumentation. In 1929, she became Langley's Chief Technical Editor and established the NACA technical reports system, authoring the Style Manual for Engineering Authors that shaped how government aerospace engineers communicated for decades. NASA's History Office called her "the architect of the NACA technical reports system." In 2015, she was inducted into NASA Langley's Hall of Honor.
- Clone any Git repository — Point Pearl at a GitHub URL and it fetches the code
- Generate a wiki — An LLM analyzes the codebase and creates structured documentation
- Ask questions — Use the built-in chat to ask questions about the code; Pearl finds relevant code snippets and explains them
Before setting up Pearl, you'll need to install:
Elixir is the programming language Pearl is written in. The easiest way to install it:
brew install elixirFollow the official Elixir installation guide.
Verify the installation:
elixir --version
# Should show Elixir 1.15 or higherPearl uses PostgreSQL to store repository data and vector embeddings for search. The easiest way is Docker (recommended):
docker compose up -dThis starts PostgreSQL 18 with pgvector pre-installed. Data persists across restarts via a named volume.
Port conflict? If port 5432 is already in use:
export PEARL_DB_PORT=5433
docker compose up -dAlternative: Native install
brew install postgresql@16 pgvector
brew services start postgresql@16See the PostgreSQL download page and pgvector installation instructions.
Pearl needs an LLM to generate wikis and answer questions. Choose one:
-
Create an account at openrouter.ai
-
Generate an API key
-
Set the environment variable:
export OPENROUTER_API_KEY=sk-your-key-here
-
Install from ollama.ai
-
Pull a model:
ollama pull llama3.2:3b
-
Clone this repository:
git clone https://github.com/existential-birds/pearl.git cd pearl/pearl -
Start PostgreSQL (if using Docker):
docker compose up -d
-
Configure your LLM provider by setting environment variables (either export directly in your terminal or add to a
.envfile to source later):# For OpenRouter (recommended) export LLM_PROVIDER=openrouter export LLM_MODEL=openai/gpt-5.2 export EMBEDDING_MODEL=openai/text-embedding-3-small export OPENROUTER_API_KEY=sk-your-key-here # For Ollama (local) # export LLM_PROVIDER=ollama # export OLLAMA_HOST=http://localhost:11434 # export OLLAMA_DEFAULT_MODEL=llama3.2:3b
-
Run setup:
mix setup
-
Start the server:
mix phx.server
-
Open Pearl in your browser at http://localhost:4000
- On the home page, paste a GitHub repository URL and click "Clone"
- Once cloned, click "Generate Wiki" to create documentation
- Browse the generated wiki pages
- Use the chat panel to ask questions about the codebase
Pearl combines several components:
- Phoenix LiveView — Real-time web interface with no JavaScript required
- RAG Pipeline — Chunks code files, generates embeddings, and searches for relevant context
- LLM Integration — Supports both cloud (OpenRouter) and local (Ollama) providers
- pgvector — Stores and searches vector embeddings for similarity matching
For detailed architecture documentation, see CLAUDE.md.
One goal of Pearl is to explore different RAG (Retrieval-Augmented Generation) architectures. We start with the simplest approach and progressively implement more sophisticated techniques from research papers.
Pearl currently implements Naive RAG, the baseline architecture:
| Component | Implementation |
|---|---|
| Chunking | Fixed 500-token chunks with semantic break detection (paragraph boundaries preferred) |
| Embedding | OpenAI text-embedding-3-small (1536 dimensions) via OpenRouter, or nomic-embed-text via Ollama |
| Vector Store | PostgreSQL with pgvector extension, HNSW indexing |
| Retrieval | Top-5 chunks by cosine similarity |
| Generation | Retrieved chunks concatenated into system prompt with chat history |
This approach is simple and works well for small-to-medium codebases, but has known limitations: no chunk overlap means context can be lost at boundaries, fixed-size chunking ignores code semantics, and top-k retrieval may miss relevant but dissimilar chunks.
Future implementations will explore strategies from ottomator-agents, combining 3-5 techniques for optimal results:
- Re-ranking — Two-stage retrieval with cross-encoder scoring (MS MARCO)
- Contextual Retrieval — LLM adds context to chunks before embedding (Anthropic)
- Context-aware Chunking — Split at semantic boundaries via Docling
- Late Chunking — Embed full document, then chunk (arXiv:2409.04701)
- Query Expansion / Multi-Query — Generate query variations for broader coverage
- Hierarchical RAG — Search child chunks, return parent context
- Knowledge Graphs — Vector search + graph traversal (Graphiti)
- Agentic RAG — Agent chooses retrieval method per query (arXiv:2501.09136)
- Self-Reflective RAG — LLM grades and refines retrieval (arXiv:2310.11511)
- Fine-tuned Embeddings — Domain-specific embedding models for 5-10% accuracy gain
# Run tests
mix test
# Format code
mix format
# Run pre-commit checks
mix precommitApache 2.0