A production-grade, full-stack RAG platform for developers. Index documents, PDFs, and web pages — then query them in any of 94+ languages with streaming LLM responses, multilingual semantic search, cross-encoder reranking, HyDE retrieval, and a hybrid knowledge base + live web pipeline. Ships with PostgreSQL, Redis, Alembic migrations, JWT auth, per-user multi-tenancy, feedback loop, async job queue (ARQ), Sentry observability, S3 storage, a custom Go scraper microservice, GraphQL + REST APIs, and a Next.js frontend — all containerised and CI-tested.
flowchart LR
subgraph Ingest["Ingestion Pipeline"]
direction TB
A[PDF / DOCX / TXT] --> FP[FileProcessor\nmagic-byte validation]
B[URL] --> SC[Go Scraper\nconcurrent, 10s timeout]
SC --> SD[Semantic Dedup\ncosine ≥ 0.92 → skip]
FP --> CK
SD --> CK[SemanticChunker\nsentence-boundary split\n1600 chars, 200 overlap]
CK --> EM[multilingual-e5-base\npassage: prefix · 768-dim]
EM --> VS[(Vector Store\nChromaDB · pgvector)]
EM --> DB[(PostgreSQL\nsources · chunks · jobs)]
end
subgraph Query["Query Pipeline"]
direction TB
Q[User Query] --> LD[langdetect\nISO 639-1]
LD --> HY{HyDE?}
HY -- yes --> LLM0[LLM generates\nhypothetical answer]
LLM0 --> AV[Average embeddings]
HY -- no --> QE[query: prefix embed]
AV --> ANN[ANN lookup\ncosine · top-k]
QE --> ANN
ANN --> RR[Cross-Encoder Reranker\nmmarco-mMiniLMv2]
RR --> GEN[LLM Generation\nconfigurable provider]
GEN --> STR[SSE Stream → browser]
end
subgraph Infra["Infrastructure"]
direction TB
RC[(Redis\ncache · JWT blocklist\nchat history)]
ARQ[ARQ Worker\nasync indexing jobs]
SN[Sentry\nFastAPI + Next.js]
S3[S3\nupload mirror]
end
VS --> ANN
DB --> ANN
RC --> Query
ARQ --> Ingest
Measured on a 20-query DevFlow documentation test set (k = 5, ChromaDB, multilingual-e5-base).
| Metric | Without reranking | With reranking |
|---|---|---|
| Precision@5 | 58.0% | 74.0% |
| MRR | 0.61 | 0.79 |
| Hit Rate | 80.0% | 90.0% |
| Avg latency | 420 ms | 680 ms |
HyDE adds ~250 ms (one LLM call) but improves Precision@5 by ~4 pp on ambiguous queries.
- Multilingual semantic search — query and index documents in 94+ languages; responses always match the query language
- Semantic chunking — sentence-boundary-aware splitter (1 600-char target, 200-char overlap carry-back) preserves sentence integrity across chunk boundaries
- HyDE retrieval — averages embeddings of the user query and an LLM-generated hypothetical answer for improved recall on ambiguous queries; hypothetical answer generated in the detected query language
- Cross-encoder reranking — multilingual cross-encoder rescores retrieved chunks before generation; yields +16 pp Precision@5 vs. dense-only retrieval
- Per-stage latency — embed / retrieve / rerank / generate timings returned in every search response; visualised as a segmented bar in the frontend
- Hybrid search — queries the knowledge base first; falls back to live web results when local coverage is low
- Language-aware caching — cache key includes the detected language code so queries in different languages never collide
- Streaming chat — SSE stream with per-session conversation memory (last 20 messages, 24h TTL), per-request model selection, and collection scoping
- Pluggable LLM backend — multiple providers supported; swap the model per request without redeploying
- File upload — PDF, DOCX, TXT up to 10 MB; magic-byte validation ensures file content matches the declared extension; processed via async ARQ job with poll-for-status; falls back to
BackgroundTaskswhen Redis is unavailable - URL indexing — scrape and index any URL; semantic deduplication (cosine similarity ≥ 0.92) skips near-duplicate chunks; async ARQ job
- Manual documents — add title + content directly via API or the Sources page
- Web result save — promote a hybrid search result directly into the knowledge base with one call
- Collections — organise sources into named workspaces; sources can belong to multiple collections; collection-scoped chat and search
- Bulk delete — remove multiple sources in a single API call
- Source inspection — view individual indexed chunks per source, including content and metadata
user_idstored with every source, collection, search history entry, and vector chunk metadata- All list endpoints filter by the authenticated user; anonymous (ownerless) data is shared read-only
- Delete endpoints enforce ownership — returns 403 if the caller does not own the record
- Alembic migration
002_multi_tenancyadds columns non-destructively to existing deployments
- Feedback loop — thumbs-up / thumbs-down on every chat response; stored with session ID, query, answer preview, and user ID
- Satisfaction rate — aggregate positive / total ratio exposed via
/api/feedback/stats - Analytics dashboard — auto-refreshes every 30 s; shows cache hit rate, vector chunk count, satisfaction rate, feedback totals, query language distribution (pie chart), searches by day with cache-hit overlay, model usage breakdown, source type distribution
- Eval harness — runs Precision@k, MRR, and Hit Rate against ground-truth source IDs; pre-seeded with 10 example queries; returns per-query breakdown including retrieved IDs, reciprocal rank, hit flag, and latency
- PostgreSQL primary database (SQLAlchemy + Alembic migrations); SQLite for local dev with zero configuration
- ChromaDB persistent vector store (
devflow_docs_v2collection, cosine similarity, 768-dim); switchable to pgvector viaVECTOR_STORE=pgvector - pgvector adapter — drop-in replacement for ChromaDB with identical public interface; uses IVFFlat cosine index and
$and-style metadata filtering; requires PostgreSQL 16+ with the pgvector extension - ARQ async job queue — Redis-backed worker processes upload and URL index jobs asynchronously; job results retained 24 h; automatic
BackgroundTasksfallback when Redis is unavailable - Redis — response caching (configurable TTL, default 1 h), JWT revocation blocklist, and chat session history
- Per-user rate limiting — extracts user ID from JWT Bearer token, falls back to IP address
- Request tracing — UUID assigned per request;
X-Request-IDheader on all responses; timing included in structured logs - Structured logging — Loguru with JSON rotation (50 MB cap, 14-day retention); request ID in every log line
- Sentry observability — FastAPI + SQLAlchemy integrations on the backend;
@sentry/nextjson the frontend; 10% trace sampling rate - Health endpoint — liveness check for DB, Redis, and ChromaDB; returns 503 if the database is unreachable
- GZip compression on all responses
- S3 upload mirroring — uploaded files copied to S3 when
AWS_S3_BUCKETis set; no-op otherwise - GraphQL API — full GraphQL schema with GraphiQL explorer, available alongside REST
- Search history pruning — database automatically capped at 5 000 most recent entries
- Go web scraper — purpose-built concurrent scraping microservice; handles up to 20 URLs per request with one goroutine per URL, 10 s timeout, and 2 MB response cap; Python fallback activates if the service is unreachable
- JWT access tokens with 24 h expiry, signed with a configurable secret
- bcrypt password hashing
- Token revocation via Redis blocklist; TTL matches token expiry so revoked tokens cannot be replayed
- GitHub Actions runs the full Python test suite, Go build + vet, and Next.js type check + production build on every push and PR to
main
| Layer | Technology |
|---|---|
| Frontend | Next.js 14, TypeScript, Redux Toolkit, RTK Query, Framer Motion, Recharts |
| Backend | FastAPI, Python 3.11, Pydantic v2, LangChain |
| Chunking | Sentence-boundary semantic splitter (1 600-char target, 200-char overlap carry-back) |
| Embeddings | intfloat/multilingual-e5-base — 94 languages, 768-dim, asymmetric query:/passage: prefixing |
| Reranker | cross-encoder/mmarco-mMiniLMv2-L12-H384-v1 — 100 languages |
| LLM | Pluggable — Google, Anthropic, OpenAI, Groq providers; configured via API keys |
| Vector DB | ChromaDB (persistent, cosine, default) · pgvector (VECTOR_STORE=pgvector) |
| Database | PostgreSQL 16 (prod) / SQLite (dev) — SQLAlchemy + Alembic |
| Job Queue | ARQ (Redis-backed async worker) with BackgroundTask fallback |
| Caching | Redis 7 |
| Auth | JWT (python-jose) + bcrypt (passlib) |
| Web Search | Brave Search API + concurrent Go scraper |
| Observability | Sentry (backend + frontend), Loguru structured logs |
| Infra | Docker Compose, GitHub Actions CI, Render |
DevFlow supports 94+ languages end-to-end:
- Indexing — language detected at ingest time (
langdetect, ISO 639-1) and stored as chunk metadata - Embedding —
multilingual-e5-basewith asymmetric prefixing:passage:for indexed documents,query:for queries, as specified in the model paper for optimal retrieval quality - HyDE — hypothetical answer generated in the detected query language so embeddings stay in-language
- Reranking —
mmarco-mMiniLMv2cross-encoder is MMARCO-trained across 100 languages - Generation — LLM prompt explicitly instructs the model to respond in the same language as the question
- Caching — language code is part of the cache key; Spanish and English queries never collide
- Re-embedding — existing v1 embeddings (all-MiniLM, 384-dim) can be upgraded via
POST /api/admin/reindex-all
| Route | Description |
|---|---|
/search |
Semantic and hybrid search — source cards, web results, per-stage latency bar |
/chat |
Streaming chat with session memory, model selection, collection scoping, and per-message feedback |
/sources |
Manage indexed sources — file upload, URL indexing, manual add, bulk delete, chunk preview |
/collections |
Create and manage collections; assign sources |
/history |
Paginated search history |
/analytics |
KPI dashboard — cache hit rate, satisfaction rate, language distribution, queries by day, model usage |
/eval |
Retrieval eval harness — Precision@k, MRR, Hit Rate against ground-truth source IDs |
/costs |
Interactive cost calculator — compare self-hosted vs. managed plans with team and volume sliders |
/login |
Login |
/register |
Register |
- Python 3.11
- Node.js 20
- Redis (local or managed)
- PostgreSQL 16 — or omit
DATABASE_URLto use SQLite for local development
cd backend
cp .env.example .env # fill in required variables (see Environment Variables below)
pip install -r requirements.txt
alembic upgrade head # apply database migrations
uvicorn main:app --reloadcd frontend
# Set NEXT_PUBLIC_API_URL in .env.local to point at your backend
npm install
npm run devProcesses file upload and URL indexing jobs asynchronously via Redis. If omitted, those endpoints fall back to BackgroundTasks automatically — no configuration change needed.
cd backend
python -m arq worker.WorkerSettingsConcurrent web scraping microservice used by the hybrid search web-fallback pipeline. The backend falls back to a Python scraper if this service is unreachable.
cd go-scraper
go build -o scraper .
SCRAPER_PORT=8001 ./scraperStarts PostgreSQL 16, Redis 7, Go scraper, backend, and frontend with health-checked dependency ordering.
cp backend/.env.example backend/.env
# Fill in API keys
docker-compose up --build| Variable | Description |
|---|---|
GEMINI_API_KEY |
Google AI API key — used by the default LLM provider |
JWT_SECRET_KEY |
Strong random secret for signing JWT tokens |
REDIS_URL |
Redis connection string |
ALLOWED_ORIGINS |
Comma-separated list of allowed frontend origins (CORS) |
NEXT_PUBLIC_API_URL |
Backend base URL consumed by the Next.js frontend |
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
sqlite:///./devflow.db |
PostgreSQL connection URL for production |
ANTHROPIC_API_KEY |
— | Enables Anthropic LLM provider |
OPENAI_API_KEY |
— | Enables OpenAI LLM provider |
GROQ_API_KEY |
— | Enables Groq LLM provider |
BRAVE_API_KEY |
— | Brave Search API key for web fallback in hybrid search |
GO_SCRAPER_URL |
— | URL of the Go concurrent scraper service |
SENTRY_DSN |
— | Backend Sentry DSN for error tracking and tracing |
NEXT_PUBLIC_SENTRY_DSN |
— | Frontend Sentry DSN |
AWS_S3_BUCKET |
— | S3 bucket name for upload mirroring |
AWS_S3_REGION |
us-east-1 |
S3 region |
AWS_ACCESS_KEY_ID |
— | AWS credentials for S3 access |
AWS_SECRET_ACCESS_KEY |
— | AWS credentials for S3 access |
VECTOR_STORE |
chroma |
Vector backend — chroma or pgvector |
CACHE_TTL |
3600 |
Redis cache TTL in seconds |
ENVIRONMENT |
production |
Environment tag sent to Sentry |
CHROMA_PATH |
./chroma_db |
ChromaDB persistence directory |
POSTGRES_PASSWORD |
devflow_secret |
PostgreSQL password used by Docker Compose |
| Method | Path | Rate limit | Description |
|---|---|---|---|
| POST | /api/auth/register |
5/min | Create account, returns JWT |
| POST | /api/auth/login |
5/min | Authenticate, returns JWT |
| POST | /api/auth/logout |
— | Revoke token (adds to Redis blocklist) |
| Method | Path | Rate limit | Description |
|---|---|---|---|
| POST | /api/search |
30/min | Semantic search with optional HyDE and cross-encoder reranking |
| POST | /api/search/hybrid |
30/min | Semantic search with live web fallback |
Both endpoints cache results in Redis, keyed by query + model + detected language.
Request body fields:
query— string, 1–1000 charsn_results— integer, 1–20, default5model—gemini-flash|gemini-pro|claude-haiku|gpt-4o-mini|groq-llama|groq-mixtralrerank— boolean, defaulttrueuse_hyde— boolean, defaultfalseuse_web— boolean (hybrid only), defaulttrue
Response fields:
answer— LLM-generated answer stringsources— array of source objects (title, url, content, metadata)model— model key usedlatency— object withembed_ms,retrieve_ms,rerank_ms,llm_mscached— boolean
| Method | Path | Description |
|---|---|---|
| POST | /api/chat/stream |
Streaming SSE chat; emits data: {"chunk": "..."} events, terminated with data: [DONE] |
| GET | /api/chat/history/{session_id} |
Retrieve conversation history for a session |
| DELETE | /api/chat/history/{session_id} |
Clear session history from Redis |
| GET | /api/chat/new-session |
Generate a new session ID |
Session history is stored in Redis under devflow:chat:{session_id} with a 24 h TTL. Keeps the last 20 messages.
Stream request body fields:
message— string, 1–2000 charssession_id— stringmodel— same values as searchuse_web— boolean, defaultfalseuse_hyde— boolean, defaultfalsecollection_id— integer (optional); scopes retrieval to a single collection
| Method | Path | Rate limit | Description |
|---|---|---|---|
| POST | /api/upload |
10/min | Upload PDF / DOCX / TXT (max 10 MB); async, returns job ID |
| GET | /api/upload/status/{job_id} |
— | Poll background indexing job status |
| POST | /api/index/url |
10/min | Index a URL; async, returns job ID |
| POST | /api/index/manual |
— | Add document content directly |
| POST | /api/save-web-result |
— | Save a web search result into the knowledge base |
Job status values: pending → processing → completed | failed. Job results retained for 24 h.
| Method | Path | Description |
|---|---|---|
| GET | /api/sources |
List sources, paginated; optional collection_id filter |
| DELETE | /api/sources/{id} |
Delete source and remove its vectors from the store |
| POST | /api/sources/bulk-delete |
Delete multiple sources — body: {"ids": [1, 2, 3]} |
| GET | /api/sources/{id}/chunks |
Inspect indexed chunks for a source |
| Method | Path | Description |
|---|---|---|
| GET | /api/collections |
List all collections with source counts |
| POST | /api/collections |
Create collection — body: {"name": "...", "description": "..."} |
| DELETE | /api/collections/{id} |
Delete collection (sources are not deleted) |
| GET | /api/collections/{id}/sources |
List sources in a collection |
| POST | /api/collections/{id}/sources/{source_id} |
Add source to collection |
| DELETE | /api/collections/{id}/sources/{source_id} |
Remove source from collection |
| Method | Path | Description |
|---|---|---|
| GET | /api/history |
Paginated search history — ?limit=50, max 200 |
| GET | /api/analytics |
Top queries, searches by day (7d), cache hit rate, model usage, source types, language distribution, feedback stats |
| GET | /api/stats |
Source count, document count, total searches, vector chunk count |
| Method | Path | Description |
|---|---|---|
| POST | /api/feedback |
Submit rating — body: {"session_id": "...", "query": "...", "rating": 1|-1} |
| GET | /api/feedback/stats |
Aggregate stats: total, thumbs_up, thumbs_down, satisfaction_rate |
| Method | Path | Description |
|---|---|---|
| POST | /api/eval/precision |
Run retrieval eval — body: {"queries": [{"query": "...", "expected_source_ids": [1,2]}], "k": 5} |
Returns precision_at_k, mrr, hit_rate, and a per-query breakdown with retrieved IDs, reciprocal rank, hit flag, and latency.
| Method | Path | Description |
|---|---|---|
| POST | /api/admin/reindex-all |
Re-embed all v1 (384-dim) content with multilingual-e5-base; async, returns job ID |
| Method | Path | Description |
|---|---|---|
| GET | /health |
Liveness check — DB, Redis, ChromaDB; returns 503 if DB is down |
| GET | / |
Status and stats summary |
| ANY | /graphql |
GraphQL endpoint with GraphiQL explorer |
Available at /graphql with the GraphiQL in-browser IDE.
Queries: sources, stats, collections, history, analytics, jobStatus
Mutations: deleteSource, createCollection, deleteCollection, addSourceToCollection, search
ML models are lazy-loaded singletons — not re-instantiated per request.
sources, documents, search_history, collections, source_collections, index_jobs, users, answer_feedback
metadata.create_all() runs on app startup as a safety net for environments without Alembic.
cd backend
alembic upgrade head # apply all pending migrations
alembic revision --autogenerate -m "description" # generate a new migration- PostgreSQL —
QueuePool:pool_size=10,max_overflow=20,pool_timeout=30 s,pool_pre_ping=True - SQLite —
StaticPoolwithcheck_same_thread=False,pool_pre_ping=True
cd backend
pytest tests/ -vCoverage includes: health check, auth (register, login, logout, duplicate rejection, wrong password), source CRUD, pagination, stats, upload validation (extension, magic bytes, file size), background job lifecycle, collection CRUD, search history, analytics, and manual document indexing.
Each test run uses an isolated in-memory SQLite database. Sentry is disabled. Rate limiting is active but scoped per test client instance.
Three services defined in render.yaml:
| Service | Runtime | Start command |
|---|---|---|
devflow-backend |
Python 3.11 | alembic upgrade head && uvicorn main:app ... |
devflow-go-scraper |
Go | ./scraper |
devflow-frontend |
Node | npm start |
Set all environment variables in the Render dashboard. No secrets are stored in the repository.
docker-compose up --buildServices: postgres (16-alpine), redis (7-alpine), go-scraper, backend, frontend. The backend waits for all three dependency healthchecks to pass before starting.
