DevFlow

A production-grade, full-stack RAG platform for developers. Index documents, PDFs, and web pages — then query them in any of 94+ languages with streaming LLM responses, multilingual semantic search, cross-encoder reranking, HyDE retrieval, and a hybrid knowledge base + live web pipeline. Ships with PostgreSQL, Redis, Alembic migrations, JWT auth, per-user multi-tenancy, feedback loop, async job queue (ARQ), Sentry observability, S3 storage, a custom Go scraper microservice, GraphQL + REST APIs, and a Next.js frontend — all containerised and CI-tested.

Demo

Architecture

flowchart LR
    subgraph Ingest["Ingestion Pipeline"]
        direction TB
        A[PDF / DOCX / TXT] --> FP[FileProcessor\nmagic-byte validation]
        B[URL] --> SC[Go Scraper\nconcurrent, 10s timeout]
        SC --> SD[Semantic Dedup\ncosine ≥ 0.92 → skip]
        FP --> CK
        SD --> CK[SemanticChunker\nsentence-boundary split\n1600 chars, 200 overlap]
        CK --> EM[multilingual-e5-base\npassage: prefix · 768-dim]
        EM --> VS[(Vector Store\nChromaDB · pgvector)]
        EM --> DB[(PostgreSQL\nsources · chunks · jobs)]
    end

    subgraph Query["Query Pipeline"]
        direction TB
        Q[User Query] --> LD[langdetect\nISO 639-1]
        LD --> HY{HyDE?}
        HY -- yes --> LLM0[LLM generates\nhypothetical answer]
        LLM0 --> AV[Average embeddings]
        HY -- no --> QE[query: prefix embed]
        AV --> ANN[ANN lookup\ncosine · top-k]
        QE --> ANN
        ANN --> RR[Cross-Encoder Reranker\nmmarco-mMiniLMv2]
        RR --> GEN[LLM Generation\nconfigurable provider]
        GEN --> STR[SSE Stream → browser]
    end

    subgraph Infra["Infrastructure"]
        direction TB
        RC[(Redis\ncache · JWT blocklist\nchat history)]
        ARQ[ARQ Worker\nasync indexing jobs]
        SN[Sentry\nFastAPI + Next.js]
        S3[S3\nupload mirror]
    end

    VS --> ANN
    DB --> ANN
    RC --> Query
    ARQ --> Ingest

Retrieval Benchmarks

Measured on a 20-query DevFlow documentation test set (k = 5, ChromaDB, multilingual-e5-base).

Metric	Without reranking	With reranking
Precision@5	58.0%	74.0%
MRR	0.61	0.79
Hit Rate	80.0%	90.0%
Avg latency	420 ms	680 ms

HyDE adds ~250 ms (one LLM call) but improves Precision@5 by ~4 pp on ambiguous queries.

Features

RAG & Search

Multilingual semantic search — query and index documents in 94+ languages; responses always match the query language
Semantic chunking — sentence-boundary-aware splitter (1 600-char target, 200-char overlap carry-back) preserves sentence integrity across chunk boundaries
HyDE retrieval — averages embeddings of the user query and an LLM-generated hypothetical answer for improved recall on ambiguous queries; hypothetical answer generated in the detected query language
Cross-encoder reranking — multilingual cross-encoder rescores retrieved chunks before generation; yields +16 pp Precision@5 vs. dense-only retrieval
Per-stage latency — embed / retrieve / rerank / generate timings returned in every search response; visualised as a segmented bar in the frontend
Hybrid search — queries the knowledge base first; falls back to live web results when local coverage is low
Language-aware caching — cache key includes the detected language code so queries in different languages never collide
Streaming chat — SSE stream with per-session conversation memory (last 20 messages, 24h TTL), per-request model selection, and collection scoping
Pluggable LLM backend — multiple providers supported; swap the model per request without redeploying

Ingestion

File upload — PDF, DOCX, TXT up to 10 MB; magic-byte validation ensures file content matches the declared extension; processed via async ARQ job with poll-for-status; falls back to BackgroundTasks when Redis is unavailable
URL indexing — scrape and index any URL; semantic deduplication (cosine similarity ≥ 0.92) skips near-duplicate chunks; async ARQ job
Manual documents — add title + content directly via API or the Sources page
Web result save — promote a hybrid search result directly into the knowledge base with one call
Collections — organise sources into named workspaces; sources can belong to multiple collections; collection-scoped chat and search
Bulk delete — remove multiple sources in a single API call
Source inspection — view individual indexed chunks per source, including content and metadata

Multi-tenancy

user_id stored with every source, collection, search history entry, and vector chunk metadata
All list endpoints filter by the authenticated user; anonymous (ownerless) data is shared read-only
Delete endpoints enforce ownership — returns 403 if the caller does not own the record
Alembic migration 002_multi_tenancy adds columns non-destructively to existing deployments

Feedback & Eval

Feedback loop — thumbs-up / thumbs-down on every chat response; stored with session ID, query, answer preview, and user ID
Satisfaction rate — aggregate positive / total ratio exposed via /api/feedback/stats
Analytics dashboard — auto-refreshes every 30 s; shows cache hit rate, vector chunk count, satisfaction rate, feedback totals, query language distribution (pie chart), searches by day with cache-hit overlay, model usage breakdown, source type distribution
Eval harness — runs Precision@k, MRR, and Hit Rate against ground-truth source IDs; pre-seeded with 10 example queries; returns per-query breakdown including retrieved IDs, reciprocal rank, hit flag, and latency

Infrastructure

PostgreSQL primary database (SQLAlchemy + Alembic migrations); SQLite for local dev with zero configuration
ChromaDB persistent vector store (devflow_docs_v2 collection, cosine similarity, 768-dim); switchable to pgvector via VECTOR_STORE=pgvector
pgvector adapter — drop-in replacement for ChromaDB with identical public interface; uses IVFFlat cosine index and $and-style metadata filtering; requires PostgreSQL 16+ with the pgvector extension
ARQ async job queue — Redis-backed worker processes upload and URL index jobs asynchronously; job results retained 24 h; automatic BackgroundTasks fallback when Redis is unavailable
Redis — response caching (configurable TTL, default 1 h), JWT revocation blocklist, and chat session history
Per-user rate limiting — extracts user ID from JWT Bearer token, falls back to IP address
Request tracing — UUID assigned per request; X-Request-ID header on all responses; timing included in structured logs
Structured logging — Loguru with JSON rotation (50 MB cap, 14-day retention); request ID in every log line
Sentry observability — FastAPI + SQLAlchemy integrations on the backend; @sentry/nextjs on the frontend; 10% trace sampling rate
Health endpoint — liveness check for DB, Redis, and ChromaDB; returns 503 if the database is unreachable
GZip compression on all responses
S3 upload mirroring — uploaded files copied to S3 when AWS_S3_BUCKET is set; no-op otherwise
GraphQL API — full GraphQL schema with GraphiQL explorer, available alongside REST
Search history pruning — database automatically capped at 5 000 most recent entries
Go web scraper — purpose-built concurrent scraping microservice; handles up to 20 URLs per request with one goroutine per URL, 10 s timeout, and 2 MB response cap; Python fallback activates if the service is unreachable

Auth

JWT access tokens with 24 h expiry, signed with a configurable secret
bcrypt password hashing
Token revocation via Redis blocklist; TTL matches token expiry so revoked tokens cannot be replayed

CI

GitHub Actions runs the full Python test suite, Go build + vet, and Next.js type check + production build on every push and PR to main

Stack

Layer	Technology
Frontend	Next.js 14, TypeScript, Redux Toolkit, RTK Query, Framer Motion, Recharts
Backend	FastAPI, Python 3.11, Pydantic v2, LangChain
Chunking	Sentence-boundary semantic splitter (1 600-char target, 200-char overlap carry-back)
Embeddings	`intfloat/multilingual-e5-base` — 94 languages, 768-dim, asymmetric `query:`/`passage:` prefixing
Reranker	`cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` — 100 languages
LLM	Pluggable — Google, Anthropic, OpenAI, Groq providers; configured via API keys
Vector DB	ChromaDB (persistent, cosine, default) · pgvector (`VECTOR_STORE=pgvector`)
Database	PostgreSQL 16 (prod) / SQLite (dev) — SQLAlchemy + Alembic
Job Queue	ARQ (Redis-backed async worker) with BackgroundTask fallback
Caching	Redis 7
Auth	JWT (`python-jose`) + bcrypt (`passlib`)
Web Search	Brave Search API + concurrent Go scraper
Observability	Sentry (backend + frontend), Loguru structured logs
Infra	Docker Compose, GitHub Actions CI, Render

Multilingual Support

DevFlow supports 94+ languages end-to-end:

Indexing — language detected at ingest time (langdetect, ISO 639-1) and stored as chunk metadata
Embedding — multilingual-e5-base with asymmetric prefixing: passage: for indexed documents, query: for queries, as specified in the model paper for optimal retrieval quality
HyDE — hypothetical answer generated in the detected query language so embeddings stay in-language
Reranking — mmarco-mMiniLMv2 cross-encoder is MMARCO-trained across 100 languages
Generation — LLM prompt explicitly instructs the model to respond in the same language as the question
Caching — language code is part of the cache key; Spanish and English queries never collide
Re-embedding — existing v1 embeddings (all-MiniLM, 384-dim) can be upgraded via POST /api/admin/reindex-all

Frontend Pages

Route	Description
`/search`	Semantic and hybrid search — source cards, web results, per-stage latency bar
`/chat`	Streaming chat with session memory, model selection, collection scoping, and per-message feedback
`/sources`	Manage indexed sources — file upload, URL indexing, manual add, bulk delete, chunk preview
`/collections`	Create and manage collections; assign sources
`/history`	Paginated search history
`/analytics`	KPI dashboard — cache hit rate, satisfaction rate, language distribution, queries by day, model usage
`/eval`	Retrieval eval harness — Precision@k, MRR, Hit Rate against ground-truth source IDs
`/costs`	Interactive cost calculator — compare self-hosted vs. managed plans with team and volume sliders
`/login`	Login
`/register`	Register

Setup

Prerequisites

Python 3.11
Node.js 20
Redis (local or managed)
PostgreSQL 16 — or omit DATABASE_URL to use SQLite for local development

Backend

cd backend
cp .env.example .env        # fill in required variables (see Environment Variables below)
pip install -r requirements.txt
alembic upgrade head        # apply database migrations
uvicorn main:app --reload

Frontend

cd frontend
# Set NEXT_PUBLIC_API_URL in .env.local to point at your backend
npm install
npm run dev

ARQ Worker (optional)

Processes file upload and URL indexing jobs asynchronously via Redis. If omitted, those endpoints fall back to BackgroundTasks automatically — no configuration change needed.

cd backend
python -m arq worker.WorkerSettings

Go Scraper (optional)

Concurrent web scraping microservice used by the hybrid search web-fallback pipeline. The backend falls back to a Python scraper if this service is unreachable.

cd go-scraper
go build -o scraper .
SCRAPER_PORT=8001 ./scraper

Docker (full stack)

Starts PostgreSQL 16, Redis 7, Go scraper, backend, and frontend with health-checked dependency ordering.

cp backend/.env.example backend/.env
# Fill in API keys
docker-compose up --build

Environment Variables

Required

Variable	Description
`GEMINI_API_KEY`	Google AI API key — used by the default LLM provider
`JWT_SECRET_KEY`	Strong random secret for signing JWT tokens
`REDIS_URL`	Redis connection string
`ALLOWED_ORIGINS`	Comma-separated list of allowed frontend origins (CORS)
`NEXT_PUBLIC_API_URL`	Backend base URL consumed by the Next.js frontend

Optional

Variable	Default	Description
`DATABASE_URL`	`sqlite:///./devflow.db`	PostgreSQL connection URL for production
`ANTHROPIC_API_KEY`	—	Enables Anthropic LLM provider
`OPENAI_API_KEY`	—	Enables OpenAI LLM provider
`GROQ_API_KEY`	—	Enables Groq LLM provider
`BRAVE_API_KEY`	—	Brave Search API key for web fallback in hybrid search
`GO_SCRAPER_URL`	—	URL of the Go concurrent scraper service
`SENTRY_DSN`	—	Backend Sentry DSN for error tracking and tracing
`NEXT_PUBLIC_SENTRY_DSN`	—	Frontend Sentry DSN
`AWS_S3_BUCKET`	—	S3 bucket name for upload mirroring
`AWS_S3_REGION`	`us-east-1`	S3 region
`AWS_ACCESS_KEY_ID`	—	AWS credentials for S3 access
`AWS_SECRET_ACCESS_KEY`	—	AWS credentials for S3 access
`VECTOR_STORE`	`chroma`	Vector backend — `chroma` or `pgvector`
`CACHE_TTL`	`3600`	Redis cache TTL in seconds
`ENVIRONMENT`	`production`	Environment tag sent to Sentry
`CHROMA_PATH`	`./chroma_db`	ChromaDB persistence directory
`POSTGRES_PASSWORD`	`devflow_secret`	PostgreSQL password used by Docker Compose

API Reference

Auth

Method	Path	Rate limit	Description
POST	`/api/auth/register`	5/min	Create account, returns JWT
POST	`/api/auth/login`	5/min	Authenticate, returns JWT
POST	`/api/auth/logout`	—	Revoke token (adds to Redis blocklist)

Search

Method	Path	Rate limit	Description
POST	`/api/search`	30/min	Semantic search with optional HyDE and cross-encoder reranking
POST	`/api/search/hybrid`	30/min	Semantic search with live web fallback

Both endpoints cache results in Redis, keyed by query + model + detected language.

Request body fields:

query — string, 1–1000 chars
n_results — integer, 1–20, default 5
model — gemini-flash | gemini-pro | claude-haiku | gpt-4o-mini | groq-llama | groq-mixtral
rerank — boolean, default true
use_hyde — boolean, default false
use_web — boolean (hybrid only), default true

Response fields:

answer — LLM-generated answer string
sources — array of source objects (title, url, content, metadata)
model — model key used
latency — object with embed_ms, retrieve_ms, rerank_ms, llm_ms
cached — boolean

Chat

Method	Path	Description
POST	`/api/chat/stream`	Streaming SSE chat; emits `data: {"chunk": "..."}` events, terminated with `data: [DONE]`
GET	`/api/chat/history/{session_id}`	Retrieve conversation history for a session
DELETE	`/api/chat/history/{session_id}`	Clear session history from Redis
GET	`/api/chat/new-session`	Generate a new session ID

Session history is stored in Redis under devflow:chat:{session_id} with a 24 h TTL. Keeps the last 20 messages.

Stream request body fields:

message — string, 1–2000 chars
session_id — string
model — same values as search
use_web — boolean, default false
use_hyde — boolean, default false
collection_id — integer (optional); scopes retrieval to a single collection

Ingestion

Method	Path	Rate limit	Description
POST	`/api/upload`	10/min	Upload PDF / DOCX / TXT (max 10 MB); async, returns job ID
GET	`/api/upload/status/{job_id}`	—	Poll background indexing job status
POST	`/api/index/url`	10/min	Index a URL; async, returns job ID
POST	`/api/index/manual`	—	Add document content directly
POST	`/api/save-web-result`	—	Save a web search result into the knowledge base

Job status values: pending → processing → completed | failed. Job results retained for 24 h.

Sources

Method	Path	Description
GET	`/api/sources`	List sources, paginated; optional `collection_id` filter
DELETE	`/api/sources/{id}`	Delete source and remove its vectors from the store
POST	`/api/sources/bulk-delete`	Delete multiple sources — body: `{"ids": [1, 2, 3]}`
GET	`/api/sources/{id}/chunks`	Inspect indexed chunks for a source

Collections

Method	Path	Description
GET	`/api/collections`	List all collections with source counts
POST	`/api/collections`	Create collection — body: `{"name": "...", "description": "..."}`
DELETE	`/api/collections/{id}`	Delete collection (sources are not deleted)
GET	`/api/collections/{id}/sources`	List sources in a collection
POST	`/api/collections/{id}/sources/{source_id}`	Add source to collection
DELETE	`/api/collections/{id}/sources/{source_id}`	Remove source from collection

History & Analytics

Method	Path	Description
GET	`/api/history`	Paginated search history — `?limit=50`, max 200
GET	`/api/analytics`	Top queries, searches by day (7d), cache hit rate, model usage, source types, language distribution, feedback stats
GET	`/api/stats`	Source count, document count, total searches, vector chunk count

Feedback

Method	Path	Description
POST	`/api/feedback`	Submit rating — body: `{"session_id": "...", "query": "...", "rating": 1\|-1}`
GET	`/api/feedback/stats`	Aggregate stats: total, thumbs_up, thumbs_down, satisfaction_rate

Eval

Method	Path	Description
POST	`/api/eval/precision`	Run retrieval eval — body: `{"queries": [{"query": "...", "expected_source_ids": [1,2]}], "k": 5}`

Returns precision_at_k, mrr, hit_rate, and a per-query breakdown with retrieved IDs, reciprocal rank, hit flag, and latency.

Admin

Method	Path	Description
POST	`/api/admin/reindex-all`	Re-embed all v1 (384-dim) content with multilingual-e5-base; async, returns job ID

System

Method	Path	Description
GET	`/health`	Liveness check — DB, Redis, ChromaDB; returns 503 if DB is down
GET	`/`	Status and stats summary
ANY	`/graphql`	GraphQL endpoint with GraphiQL explorer

GraphQL

Available at /graphql with the GraphiQL in-browser IDE.

Queries: sources, stats, collections, history, analytics, jobStatus

Mutations: deleteSource, createCollection, deleteCollection, addSourceToCollection, search

ML models are lazy-loaded singletons — not re-instantiated per request.

Database

Tables

sources, documents, search_history, collections, source_collections, index_jobs, users, answer_feedback

metadata.create_all() runs on app startup as a safety net for environments without Alembic.

Migrations

cd backend
alembic upgrade head                                     # apply all pending migrations
alembic revision --autogenerate -m "description"         # generate a new migration

Connection pooling

PostgreSQL — QueuePool: pool_size=10, max_overflow=20, pool_timeout=30 s, pool_pre_ping=True
SQLite — StaticPool with check_same_thread=False, pool_pre_ping=True

Testing

cd backend
pytest tests/ -v

Coverage includes: health check, auth (register, login, logout, duplicate rejection, wrong password), source CRUD, pagination, stats, upload validation (extension, magic bytes, file size), background job lifecycle, collection CRUD, search history, analytics, and manual document indexing.

Each test run uses an isolated in-memory SQLite database. Sentry is disabled. Rate limiting is active but scoped per test client instance.

Deployment

Render

Three services defined in render.yaml:

Service	Runtime	Start command
`devflow-backend`	Python 3.11	`alembic upgrade head && uvicorn main:app ...`
`devflow-go-scraper`	Go	`./scraper`
`devflow-frontend`	Node	`npm start`

Set all environment variables in the Render dashboard. No secrets are stored in the repository.

Docker Compose

docker-compose up --build

Services: postgres (16-alpine), redis (7-alpine), go-scraper, backend, frontend. The backend waits for all three dependency healthchecks to pass before starting.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
backend		backend
frontend		frontend
go-scraper		go-scraper
.gitignore		.gitignore
Dockerfile.backend		Dockerfile.backend
Dockerfile.frontend		Dockerfile.frontend
README.md		README.md
devflow-demo.gif		devflow-demo.gif
docker-compose.yml		docker-compose.yml
render.yaml		render.yaml

Folders and files

Latest commit

History

Repository files navigation

DevFlow

Demo

Architecture

Retrieval Benchmarks

Features

RAG & Search

Ingestion

Multi-tenancy

Feedback & Eval

Infrastructure

Auth

CI

Stack

Multilingual Support

Frontend Pages

Setup

Prerequisites

Backend

Frontend

ARQ Worker (optional)

Go Scraper (optional)

Docker (full stack)

Environment Variables

Required

Optional

API Reference

Auth

Search

Chat

Ingestion

Sources

Collections

History & Analytics

Feedback

Eval

Admin

System

GraphQL

Database

Tables

Migrations

Connection pooling

Testing

Deployment

Render

Docker Compose

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages