ragrep

Hybrid FAISS + BM25 RAG pipeline. Ingest any document source, build a searchable index, query with natural language.

make scrape SOURCE=slack    # pull data from a source
make ingest                  # build FAISS + BM25 index (incremental)
make query Q="how does X work"

Or use the Python SDK directly:

from ragrep import Index, Document, EmbedModel

index = Index(
    "data/index",
    embedding_models=[EmbedModel(provider="voyage", model_name="voyage-code-3")],
)
index.ingest([Document(id="1", source="docs", content="...", title="...")])
results = index.query("how does auth work")

What it is

ragrep is a self-hosted RAG pipeline with a focus on retrieval quality. It combines dense vector search (FAISS), sparse keyword search (BM25), and reranking into a single pipeline with a clean Python SDK.

Key design points:

Hybrid retrieval — FAISS cosine + BM25, fused with Reciprocal Rank Fusion (k=60)
Multi-model embedding — run multiple embedding models in parallel, fuse with RRF for better recall
Incremental ingestion — content-hash cache means re-indexing only processes new or changed documents
Adaptive rate limiting — learns the API's actual throughput boundary at runtime; checkpoint resumability for long embedding runs
Multi-source scrapers — Slack, Confluence/Jira, Google Drive, Git commits, Bitbucket PRs, local files
CLI + HTTP server — ragrep "query" as a local CLI, or serve over HTTP for multi-client use

Installation

Requires Python 3.12+ and uv.

git clone https://github.com/fntune/ragrep
cd ragrep
uv sync --extra full   # includes FAISS, Voyage AI, sentence-transformers

Copy .env.example to .env and fill in your API keys.

Usage

CLI

ragrep "how does the auth flow work"
ragrep "deployment process" -n 10          # top 10 results
ragrep "slack token" -s slack              # filter by source
ragrep "pricing logic" --after 3m          # last 3 months
ragrep "query" --json                      # compact JSON output
ragrep "query" --scores                    # include retrieval scores
ragrep "query" --server http://localhost:8321

Makefile

make install                     # set up venv
make scrape                      # all sources
make scrape SOURCE=slack         # single source
make ingest                      # incremental build
make ingest FORCE=1              # re-embed everything
make query Q="how does X work"
make query                       # interactive REPL
make stats                       # index statistics
make eval                        # evaluation harness
make serve                       # HTTP server on :8321

Python SDK

from ragrep import Index, Document, EmbedModel

index = Index(
    index_dir="data/index",
    embedding_models=[
        EmbedModel(provider="voyage", model_name="voyage-code-3"),
    ],
    reranker_provider="voyage",
    reranker_model="rerank-2.5",
    dedup_threshold=0.5,
)

stats = index.ingest([
    Document(id="doc1", source="docs", content="...", title="Getting Started"),
    Document(id="doc2", source="code", content="...", title="auth.py"),
])

results = index.query("how does authentication work", top_k=5)
for r in results:
    print(r.score, r.title, r.snippet)

Multi-model embedding for better recall:

index = Index(
    "data/index",
    embedding_models=[
        EmbedModel(provider="voyage", model_name="voyage-code-3"),
        EmbedModel(provider="sentence-transformers", model_name="Qwen3-Embedding-0.6B"),
    ],
)

Data sources

Source	What it ingests	Required credentials
`slack`	Messages, threads, pins, bookmarks, file contents	`SLACK_TOKEN`
`atlassian`	Confluence pages, Jira issues + comments	`CONFLUENCE_URL`, `JIRA_URL`, `ATLASSIAN_USERNAME`, `ATLASSIAN_API_TOKEN`
`gdrive`	Docs, Sheets, Slides, PDFs	`gcloud auth application-default login`
`git`	Commit messages from local repos	— (local filesystem)
`bitbucket`	PR descriptions + comments	`BITBUCKET_OAUTH_SECRET`
`files`	PDF, DOCX, PPTX, XLSX (extracted via Gemini)	`GOOGLE_API_KEY`

Configure in config.toml:

[scrape.slack]
date_cutoff = "2024-01-01"

[scrape.git]
repos = ["../myrepo", "../otherrepo"]
since = "2024-01-01"

[scrape.bitbucket]
workspace = "your-workspace"
states = ["MERGED", "OPEN"]
since = "2024-01-01"

Embedding providers

Provider	Model	Notes
`voyage` (default)	`voyage-code-3`	API, 1024d, code + text in same space
`sentence-transformers`	`Qwen3-Embedding-0.6B`	local, GPU/MPS recommended

Rerankers: voyage/rerank-2.5 (API) or sentence-transformers/cross-encoder/ms-marco-MiniLM-L-6-v2 (local).

HTTP server

make serve    # port 8321

curl "http://localhost:8321/search?q=auth+flow&mode=hybrid&n=5"
curl "http://localhost:8321/search?q=deploy&mode=grep"

# CLI against server
export RAGREP_SERVER=http://localhost:8321
ragrep "query"

Set RAGREP_GCS_BUCKET to auto-download the index from GCS at server startup.

Remote ingestion (Modal)

pip install modal && modal token new
modal run --detach modal_ingest.py
modal volume get ragrep-index index data/index

Voyage embedder has adaptive rate limiting and checkpoint resumability. Set NTFY_TOPIC for hourly progress notifications via ntfy.sh.

Architecture

scrape → normalize → chunk → embed (incremental) → store (FAISS + BM25)
                                                         |
                                           query → retrieve → rerank → generate

Retrieval pipeline:

Embed query with each configured model
FAISS cosine search + BM25 per model
RRF fusion (k=60) across all result lists
Voyage/cross-encoder reranking
Jaccard dedup on content overlap

Incremental ingestion: embed_cache.pkl maps sha256(content) → vector. Only new/changed chunks hit the embedding API. First run bootstraps from existing FAISS via reconstruct_n.

src/ragrep/
  __init__.py      public API: Index, Document, EmbedModel
  index.py         SDK: ingest() and query()
  models.py        Document, Chunk, SearchResult, QueryResult
  cli.py           CLI
  server.py        FastAPI HTTP server
  ingest/
    pipeline.py    orchestrator
    scrape_*.py    source scrapers
    embed.py       providers + adaptive rate limiting + checkpointing
    store.py       FAISS + BM25 persistence
  query/
    retrieve.py    hybrid retrieval + RRF
    rerank.py      reranking providers
    generate.py    Ollama generation
  eval/
    harness.py     evaluation with per-stage metrics (dense/BM25/RRF/rerank)

Development

make check    # ruff + mypy
make test     # pytest

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src/ragrep		src/ragrep
tests		tests
.env.template		.env.template
.gcloudignore		.gcloudignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
config.toml		config.toml
modal_ingest.py		modal_ingest.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ragrep

What it is

Installation

Usage

CLI

Makefile

Python SDK

Data sources

Embedding providers

HTTP server

Remote ingestion (Modal)

Architecture

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ragrep

What it is

Installation

Usage

CLI

Makefile

Python SDK

Data sources

Embedding providers

HTTP server

Remote ingestion (Modal)

Architecture

Development

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages