Skip to content

SriramAtmakuri/DevFlow

Repository files navigation

DevFlow

A production-grade, full-stack RAG platform for developers. Index documents, PDFs, and web pages — then query them in any of 94+ languages with streaming LLM responses, multilingual semantic search, cross-encoder reranking, HyDE retrieval, and a hybrid knowledge base + live web pipeline. Ships with PostgreSQL, Redis, Alembic migrations, JWT auth, per-user multi-tenancy, feedback loop, async job queue (ARQ), Sentry observability, S3 storage, a custom Go scraper microservice, GraphQL + REST APIs, and a Next.js frontend — all containerised and CI-tested.


Demo

DevFlow demo


Architecture

flowchart LR
    subgraph Ingest["Ingestion Pipeline"]
        direction TB
        A[PDF / DOCX / TXT] --> FP[FileProcessor\nmagic-byte validation]
        B[URL] --> SC[Go Scraper\nconcurrent, 10s timeout]
        SC --> SD[Semantic Dedup\ncosine ≥ 0.92 → skip]
        FP --> CK
        SD --> CK[SemanticChunker\nsentence-boundary split\n1600 chars, 200 overlap]
        CK --> EM[multilingual-e5-base\npassage: prefix · 768-dim]
        EM --> VS[(Vector Store\nChromaDB · pgvector)]
        EM --> DB[(PostgreSQL\nsources · chunks · jobs)]
    end

    subgraph Query["Query Pipeline"]
        direction TB
        Q[User Query] --> LD[langdetect\nISO 639-1]
        LD --> HY{HyDE?}
        HY -- yes --> LLM0[LLM generates\nhypothetical answer]
        LLM0 --> AV[Average embeddings]
        HY -- no --> QE[query: prefix embed]
        AV --> ANN[ANN lookup\ncosine · top-k]
        QE --> ANN
        ANN --> RR[Cross-Encoder Reranker\nmmarco-mMiniLMv2]
        RR --> GEN[LLM Generation\nconfigurable provider]
        GEN --> STR[SSE Stream → browser]
    end

    subgraph Infra["Infrastructure"]
        direction TB
        RC[(Redis\ncache · JWT blocklist\nchat history)]
        ARQ[ARQ Worker\nasync indexing jobs]
        SN[Sentry\nFastAPI + Next.js]
        S3[S3\nupload mirror]
    end

    VS --> ANN
    DB --> ANN
    RC --> Query
    ARQ --> Ingest
Loading

Retrieval Benchmarks

Measured on a 20-query DevFlow documentation test set (k = 5, ChromaDB, multilingual-e5-base).

Metric Without reranking With reranking
Precision@5 58.0% 74.0%
MRR 0.61 0.79
Hit Rate 80.0% 90.0%
Avg latency 420 ms 680 ms

HyDE adds ~250 ms (one LLM call) but improves Precision@5 by ~4 pp on ambiguous queries.


Features

RAG & Search

  • Multilingual semantic search — query and index documents in 94+ languages; responses always match the query language
  • Semantic chunking — sentence-boundary-aware splitter (1 600-char target, 200-char overlap carry-back) preserves sentence integrity across chunk boundaries
  • HyDE retrieval — averages embeddings of the user query and an LLM-generated hypothetical answer for improved recall on ambiguous queries; hypothetical answer generated in the detected query language
  • Cross-encoder reranking — multilingual cross-encoder rescores retrieved chunks before generation; yields +16 pp Precision@5 vs. dense-only retrieval
  • Per-stage latency — embed / retrieve / rerank / generate timings returned in every search response; visualised as a segmented bar in the frontend
  • Hybrid search — queries the knowledge base first; falls back to live web results when local coverage is low
  • Language-aware caching — cache key includes the detected language code so queries in different languages never collide
  • Streaming chat — SSE stream with per-session conversation memory (last 20 messages, 24h TTL), per-request model selection, and collection scoping
  • Pluggable LLM backend — multiple providers supported; swap the model per request without redeploying

Ingestion

  • File upload — PDF, DOCX, TXT up to 10 MB; magic-byte validation ensures file content matches the declared extension; processed via async ARQ job with poll-for-status; falls back to BackgroundTasks when Redis is unavailable
  • URL indexing — scrape and index any URL; semantic deduplication (cosine similarity ≥ 0.92) skips near-duplicate chunks; async ARQ job
  • Manual documents — add title + content directly via API or the Sources page
  • Web result save — promote a hybrid search result directly into the knowledge base with one call
  • Collections — organise sources into named workspaces; sources can belong to multiple collections; collection-scoped chat and search
  • Bulk delete — remove multiple sources in a single API call
  • Source inspection — view individual indexed chunks per source, including content and metadata

Multi-tenancy

  • user_id stored with every source, collection, search history entry, and vector chunk metadata
  • All list endpoints filter by the authenticated user; anonymous (ownerless) data is shared read-only
  • Delete endpoints enforce ownership — returns 403 if the caller does not own the record
  • Alembic migration 002_multi_tenancy adds columns non-destructively to existing deployments

Feedback & Eval

  • Feedback loop — thumbs-up / thumbs-down on every chat response; stored with session ID, query, answer preview, and user ID
  • Satisfaction rate — aggregate positive / total ratio exposed via /api/feedback/stats
  • Analytics dashboard — auto-refreshes every 30 s; shows cache hit rate, vector chunk count, satisfaction rate, feedback totals, query language distribution (pie chart), searches by day with cache-hit overlay, model usage breakdown, source type distribution
  • Eval harness — runs Precision@k, MRR, and Hit Rate against ground-truth source IDs; pre-seeded with 10 example queries; returns per-query breakdown including retrieved IDs, reciprocal rank, hit flag, and latency

Infrastructure

  • PostgreSQL primary database (SQLAlchemy + Alembic migrations); SQLite for local dev with zero configuration
  • ChromaDB persistent vector store (devflow_docs_v2 collection, cosine similarity, 768-dim); switchable to pgvector via VECTOR_STORE=pgvector
  • pgvector adapter — drop-in replacement for ChromaDB with identical public interface; uses IVFFlat cosine index and $and-style metadata filtering; requires PostgreSQL 16+ with the pgvector extension
  • ARQ async job queue — Redis-backed worker processes upload and URL index jobs asynchronously; job results retained 24 h; automatic BackgroundTasks fallback when Redis is unavailable
  • Redis — response caching (configurable TTL, default 1 h), JWT revocation blocklist, and chat session history
  • Per-user rate limiting — extracts user ID from JWT Bearer token, falls back to IP address
  • Request tracing — UUID assigned per request; X-Request-ID header on all responses; timing included in structured logs
  • Structured logging — Loguru with JSON rotation (50 MB cap, 14-day retention); request ID in every log line
  • Sentry observability — FastAPI + SQLAlchemy integrations on the backend; @sentry/nextjs on the frontend; 10% trace sampling rate
  • Health endpoint — liveness check for DB, Redis, and ChromaDB; returns 503 if the database is unreachable
  • GZip compression on all responses
  • S3 upload mirroring — uploaded files copied to S3 when AWS_S3_BUCKET is set; no-op otherwise
  • GraphQL API — full GraphQL schema with GraphiQL explorer, available alongside REST
  • Search history pruning — database automatically capped at 5 000 most recent entries
  • Go web scraper — purpose-built concurrent scraping microservice; handles up to 20 URLs per request with one goroutine per URL, 10 s timeout, and 2 MB response cap; Python fallback activates if the service is unreachable

Auth

  • JWT access tokens with 24 h expiry, signed with a configurable secret
  • bcrypt password hashing
  • Token revocation via Redis blocklist; TTL matches token expiry so revoked tokens cannot be replayed

CI

  • GitHub Actions runs the full Python test suite, Go build + vet, and Next.js type check + production build on every push and PR to main

Stack

Layer Technology
Frontend Next.js 14, TypeScript, Redux Toolkit, RTK Query, Framer Motion, Recharts
Backend FastAPI, Python 3.11, Pydantic v2, LangChain
Chunking Sentence-boundary semantic splitter (1 600-char target, 200-char overlap carry-back)
Embeddings intfloat/multilingual-e5-base — 94 languages, 768-dim, asymmetric query:/passage: prefixing
Reranker cross-encoder/mmarco-mMiniLMv2-L12-H384-v1 — 100 languages
LLM Pluggable — Google, Anthropic, OpenAI, Groq providers; configured via API keys
Vector DB ChromaDB (persistent, cosine, default) · pgvector (VECTOR_STORE=pgvector)
Database PostgreSQL 16 (prod) / SQLite (dev) — SQLAlchemy + Alembic
Job Queue ARQ (Redis-backed async worker) with BackgroundTask fallback
Caching Redis 7
Auth JWT (python-jose) + bcrypt (passlib)
Web Search Brave Search API + concurrent Go scraper
Observability Sentry (backend + frontend), Loguru structured logs
Infra Docker Compose, GitHub Actions CI, Render

Multilingual Support

DevFlow supports 94+ languages end-to-end:

  • Indexing — language detected at ingest time (langdetect, ISO 639-1) and stored as chunk metadata
  • Embeddingmultilingual-e5-base with asymmetric prefixing: passage: for indexed documents, query: for queries, as specified in the model paper for optimal retrieval quality
  • HyDE — hypothetical answer generated in the detected query language so embeddings stay in-language
  • Rerankingmmarco-mMiniLMv2 cross-encoder is MMARCO-trained across 100 languages
  • Generation — LLM prompt explicitly instructs the model to respond in the same language as the question
  • Caching — language code is part of the cache key; Spanish and English queries never collide
  • Re-embedding — existing v1 embeddings (all-MiniLM, 384-dim) can be upgraded via POST /api/admin/reindex-all

Frontend Pages

Route Description
/search Semantic and hybrid search — source cards, web results, per-stage latency bar
/chat Streaming chat with session memory, model selection, collection scoping, and per-message feedback
/sources Manage indexed sources — file upload, URL indexing, manual add, bulk delete, chunk preview
/collections Create and manage collections; assign sources
/history Paginated search history
/analytics KPI dashboard — cache hit rate, satisfaction rate, language distribution, queries by day, model usage
/eval Retrieval eval harness — Precision@k, MRR, Hit Rate against ground-truth source IDs
/costs Interactive cost calculator — compare self-hosted vs. managed plans with team and volume sliders
/login Login
/register Register

Setup

Prerequisites

  • Python 3.11
  • Node.js 20
  • Redis (local or managed)
  • PostgreSQL 16 — or omit DATABASE_URL to use SQLite for local development

Backend

cd backend
cp .env.example .env        # fill in required variables (see Environment Variables below)
pip install -r requirements.txt
alembic upgrade head        # apply database migrations
uvicorn main:app --reload

Frontend

cd frontend
# Set NEXT_PUBLIC_API_URL in .env.local to point at your backend
npm install
npm run dev

ARQ Worker (optional)

Processes file upload and URL indexing jobs asynchronously via Redis. If omitted, those endpoints fall back to BackgroundTasks automatically — no configuration change needed.

cd backend
python -m arq worker.WorkerSettings

Go Scraper (optional)

Concurrent web scraping microservice used by the hybrid search web-fallback pipeline. The backend falls back to a Python scraper if this service is unreachable.

cd go-scraper
go build -o scraper .
SCRAPER_PORT=8001 ./scraper

Docker (full stack)

Starts PostgreSQL 16, Redis 7, Go scraper, backend, and frontend with health-checked dependency ordering.

cp backend/.env.example backend/.env
# Fill in API keys
docker-compose up --build

Environment Variables

Required

Variable Description
GEMINI_API_KEY Google AI API key — used by the default LLM provider
JWT_SECRET_KEY Strong random secret for signing JWT tokens
REDIS_URL Redis connection string
ALLOWED_ORIGINS Comma-separated list of allowed frontend origins (CORS)
NEXT_PUBLIC_API_URL Backend base URL consumed by the Next.js frontend

Optional

Variable Default Description
DATABASE_URL sqlite:///./devflow.db PostgreSQL connection URL for production
ANTHROPIC_API_KEY Enables Anthropic LLM provider
OPENAI_API_KEY Enables OpenAI LLM provider
GROQ_API_KEY Enables Groq LLM provider
BRAVE_API_KEY Brave Search API key for web fallback in hybrid search
GO_SCRAPER_URL URL of the Go concurrent scraper service
SENTRY_DSN Backend Sentry DSN for error tracking and tracing
NEXT_PUBLIC_SENTRY_DSN Frontend Sentry DSN
AWS_S3_BUCKET S3 bucket name for upload mirroring
AWS_S3_REGION us-east-1 S3 region
AWS_ACCESS_KEY_ID AWS credentials for S3 access
AWS_SECRET_ACCESS_KEY AWS credentials for S3 access
VECTOR_STORE chroma Vector backend — chroma or pgvector
CACHE_TTL 3600 Redis cache TTL in seconds
ENVIRONMENT production Environment tag sent to Sentry
CHROMA_PATH ./chroma_db ChromaDB persistence directory
POSTGRES_PASSWORD devflow_secret PostgreSQL password used by Docker Compose

API Reference

Auth

Method Path Rate limit Description
POST /api/auth/register 5/min Create account, returns JWT
POST /api/auth/login 5/min Authenticate, returns JWT
POST /api/auth/logout Revoke token (adds to Redis blocklist)

Search

Method Path Rate limit Description
POST /api/search 30/min Semantic search with optional HyDE and cross-encoder reranking
POST /api/search/hybrid 30/min Semantic search with live web fallback

Both endpoints cache results in Redis, keyed by query + model + detected language.

Request body fields:

  • query — string, 1–1000 chars
  • n_results — integer, 1–20, default 5
  • modelgemini-flash | gemini-pro | claude-haiku | gpt-4o-mini | groq-llama | groq-mixtral
  • rerank — boolean, default true
  • use_hyde — boolean, default false
  • use_web — boolean (hybrid only), default true

Response fields:

  • answer — LLM-generated answer string
  • sources — array of source objects (title, url, content, metadata)
  • model — model key used
  • latency — object with embed_ms, retrieve_ms, rerank_ms, llm_ms
  • cached — boolean

Chat

Method Path Description
POST /api/chat/stream Streaming SSE chat; emits data: {"chunk": "..."} events, terminated with data: [DONE]
GET /api/chat/history/{session_id} Retrieve conversation history for a session
DELETE /api/chat/history/{session_id} Clear session history from Redis
GET /api/chat/new-session Generate a new session ID

Session history is stored in Redis under devflow:chat:{session_id} with a 24 h TTL. Keeps the last 20 messages.

Stream request body fields:

  • message — string, 1–2000 chars
  • session_id — string
  • model — same values as search
  • use_web — boolean, default false
  • use_hyde — boolean, default false
  • collection_id — integer (optional); scopes retrieval to a single collection

Ingestion

Method Path Rate limit Description
POST /api/upload 10/min Upload PDF / DOCX / TXT (max 10 MB); async, returns job ID
GET /api/upload/status/{job_id} Poll background indexing job status
POST /api/index/url 10/min Index a URL; async, returns job ID
POST /api/index/manual Add document content directly
POST /api/save-web-result Save a web search result into the knowledge base

Job status values: pendingprocessingcompleted | failed. Job results retained for 24 h.

Sources

Method Path Description
GET /api/sources List sources, paginated; optional collection_id filter
DELETE /api/sources/{id} Delete source and remove its vectors from the store
POST /api/sources/bulk-delete Delete multiple sources — body: {"ids": [1, 2, 3]}
GET /api/sources/{id}/chunks Inspect indexed chunks for a source

Collections

Method Path Description
GET /api/collections List all collections with source counts
POST /api/collections Create collection — body: {"name": "...", "description": "..."}
DELETE /api/collections/{id} Delete collection (sources are not deleted)
GET /api/collections/{id}/sources List sources in a collection
POST /api/collections/{id}/sources/{source_id} Add source to collection
DELETE /api/collections/{id}/sources/{source_id} Remove source from collection

History & Analytics

Method Path Description
GET /api/history Paginated search history — ?limit=50, max 200
GET /api/analytics Top queries, searches by day (7d), cache hit rate, model usage, source types, language distribution, feedback stats
GET /api/stats Source count, document count, total searches, vector chunk count

Feedback

Method Path Description
POST /api/feedback Submit rating — body: {"session_id": "...", "query": "...", "rating": 1|-1}
GET /api/feedback/stats Aggregate stats: total, thumbs_up, thumbs_down, satisfaction_rate

Eval

Method Path Description
POST /api/eval/precision Run retrieval eval — body: {"queries": [{"query": "...", "expected_source_ids": [1,2]}], "k": 5}

Returns precision_at_k, mrr, hit_rate, and a per-query breakdown with retrieved IDs, reciprocal rank, hit flag, and latency.

Admin

Method Path Description
POST /api/admin/reindex-all Re-embed all v1 (384-dim) content with multilingual-e5-base; async, returns job ID

System

Method Path Description
GET /health Liveness check — DB, Redis, ChromaDB; returns 503 if DB is down
GET / Status and stats summary
ANY /graphql GraphQL endpoint with GraphiQL explorer

GraphQL

Available at /graphql with the GraphiQL in-browser IDE.

Queries: sources, stats, collections, history, analytics, jobStatus

Mutations: deleteSource, createCollection, deleteCollection, addSourceToCollection, search

ML models are lazy-loaded singletons — not re-instantiated per request.


Database

Tables

sources, documents, search_history, collections, source_collections, index_jobs, users, answer_feedback

metadata.create_all() runs on app startup as a safety net for environments without Alembic.

Migrations

cd backend
alembic upgrade head                                     # apply all pending migrations
alembic revision --autogenerate -m "description"         # generate a new migration

Connection pooling

  • PostgreSQLQueuePool: pool_size=10, max_overflow=20, pool_timeout=30 s, pool_pre_ping=True
  • SQLiteStaticPool with check_same_thread=False, pool_pre_ping=True

Testing

cd backend
pytest tests/ -v

Coverage includes: health check, auth (register, login, logout, duplicate rejection, wrong password), source CRUD, pagination, stats, upload validation (extension, magic bytes, file size), background job lifecycle, collection CRUD, search history, analytics, and manual document indexing.

Each test run uses an isolated in-memory SQLite database. Sentry is disabled. Rate limiting is active but scoped per test client instance.


Deployment

Render

Three services defined in render.yaml:

Service Runtime Start command
devflow-backend Python 3.11 alembic upgrade head && uvicorn main:app ...
devflow-go-scraper Go ./scraper
devflow-frontend Node npm start

Set all environment variables in the Render dashboard. No secrets are stored in the repository.

Docker Compose

docker-compose up --build

Services: postgres (16-alpine), redis (7-alpine), go-scraper, backend, frontend. The backend waits for all three dependency healthchecks to pass before starting.

About

It is a multilingual RAG platform with hybrid search, HyDE retrieval, and streaming LLM responses

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors