VIKA - Evaluated RAG Assistant for Scientific Documents

Live Demo

Part 1 - Concept and Presentation

What is VIKA?

VIKA is an AI assistant that answers questions from scientific PDF documents uploaded by the user. It is designed for students, teachers, researchers, and technical teams who need answers grounded in their own course notes, articles, reports, or lecture slides.

Instead of answering only from general model knowledge, VIKA first searches the uploaded documents, extracts the most relevant passages, and asks an LLM to answer using those passages. Answers include page-level citations using the original file name, for example [lecture_sat.pdf p.42].

Who is it for?

Students who want explanations from their lecture notes.
Teachers who want a document-grounded assistant for course material.
Researchers who want to query papers or reports without manually searching through pages.
Developers and ML practitioners who want a compact RAG system that can run on Hugging Face Spaces free tier.

What problem does it solve?

Large PDFs are hard to search and summarize manually. Generic chatbots may answer confidently but without grounding in the user's documents. VIKA reduces that risk by combining document retrieval, citation-aware prompting, model routing, and an in-session evaluation dashboard.

User Experience

Upload one or more PDFs.
VIKA extracts text, classifies pages, chunks content, and builds search indexes.
Ask a question in the Gradio chat.
Choose either automatic LLM routing or a manual model.
Read the answer with page citations and the final model used.
Inspect retrieved chunks and session metrics in the UI.

Main Capabilities

Runtime PDF upload only; no preloaded demo documents.
PDF deduplication using SHA-256.
Smart page classification: text, illustrative, mixed, scanned.
Targeted bilingual OCR with Tesseract fra+eng.
Language detection and embedding model routing.
Page-aware chunking with section title extraction.
Hybrid retrieval with dense FAISS + BM25 + Reciprocal Rank Fusion.
CrossEncoder reranking.
Citations using original file names, truncated when too long.
Configurable retrieval mode: dense, bm25, hybrid.
Automatic or manual LLM routing.
OpenRouter support for openai/gpt-oss-120b.
Quota-aware fallback when a model is temporarily unavailable.
Session evaluation dashboard with retrieval, latency, routing, and model metrics.

Conceptual Architecture

flowchart LR
    A["User uploads PDFs"] --> B["Document intake and deduplication"]
    B --> C["Page classification"]
    C --> D["Text extraction or targeted OCR"]
    D --> E["Language detection"]
    E --> F["Page-aware chunks"]
    F --> G["Hybrid retrieval"]
    G --> H["Reranking"]
    H --> I["Prompt with citations"]
    I --> J["Automatic or manual LLM routing"]
    J --> K["Streaming answer in Gradio"]
    K --> L["Session evaluation dashboard"]

Why Automatic LLM Routing?

Not every question needs the most powerful model. A simple definition can often be answered by a smaller, faster model, while a proof, comparison, or multi-step reasoning task may need a more capable model.

VIKA supports:

Automatic mode: evaluates prompt complexity and chooses a suitable available model.
Manual mode: uses the model selected by the user.

The final response always shows the model actually used, including after fallback.

Evaluation Dashboard

VIKA keeps an in-memory evaluation log for the current session. It helps users understand what happened for each question:

Which LLM was used.
Which retrieval mode was used.
How many chunks were retrieved and injected.
How long retrieval and generation took.
Whether top retrieved chunks looked relevant according to the reranker.
How much BM25 contributed to the final context.

Limitations

Hugging Face Spaces free tier has CPU-only execution and ephemeral storage.
Uploaded documents and in-session metrics are reset when the Space restarts.
OCR quality depends on PDF scan quality and installed Tesseract language packs.
Complex multi-column layouts may still produce imperfect reading order.
Hit@5, Recall@5, and MRR are proxy metrics based on CrossEncoder scores, not human labels.
RouteLLM routers may need local weights or configuration; VIKA falls back to a local heuristic if a router is unavailable.

Part 2 - Technical Implementation

Runtime Constraints

VIKA is built for Hugging Face Spaces free tier:

CPU only.
Ephemeral filesystem.
No persistent vector database.
Gradio is the only interface.
No FastAPI layer.
User documents are uploaded at runtime.

Repository Structure

File	Purpose
`app.py`	Gradio UI, upload flow, chat flow, evaluation dashboard
`document_intake.py`	PDF validation, SHA-256 deduplication, manifest writing
`parser_utils.py`	PyMuPDF extraction, page-specific OCR, language detection
`page_classifier.py`	Page classification logic
`chunker.py`	Page-aware chunk generation
`embed_faiss.py`	Embedding model routing and FAISS indexing
`retriever.py`	Dense retrieval, BM25 retrieval, RRF fusion, retrieval metrics support
`reranker.py`	CrossEncoder reranking
`prompt_builder.py`	Prompt construction and citation label formatting
`llm_router.py`	Provider routing, RouteLLM/heuristic routing, fallback handling
`tests/`	Pytest coverage with synthetic PDFs and mocks

Detailed Pipeline

flowchart TD
    A["PDF upload"] --> B["Validate PDF header / pypdf load"]
    B --> C["SHA-256 hash deduplication"]
    C --> D["Store under data/pdfs/hash.pdf"]
    D --> E["PyMuPDF page inspection"]
    E --> F{"Page type"}
    F -->|"text"| G["get_text('text')"]
    F -->|"illustrative"| G
    F -->|"scanned"| H["Full-page render at 300 DPI + Tesseract fra+eng"]
    F -->|"mixed"| I["Text blocks + OCR image zones"]
    G --> J["Document language detection"]
    H --> J
    I --> J
    J --> K["Page-aware chunks with metadata"]
    K --> L["Embedding model selection"]
    L --> M["FAISS IndexFlatIP"]
    K --> N["BM25 corpus"]
    M --> O["Dense retrieval"]
    N --> P["BM25 retrieval"]
    O --> Q["RRF fusion"]
    P --> Q
    Q --> R["CrossEncoder reranking"]
    R --> S["Prompt with original filename citations"]
    S --> T["LLM routing and fallback"]
    T --> U["Streaming answer"]
    R --> V["Evaluation dashboard"]

Chunk Metadata

Every chunk contains:

{
  "id": 0,
  "text": "chunk text",
  "doc_id": "sha256_document_id",
  "page": 42,
  "char_start": 0,
  "char_end": 1000,
  "section_title": "optional section title",
  "page_type": "text",
  "lang": "en"
}

The page field is preserved through FAISS metadata, retrieval, prompt construction, and UI citations.

Page Classification

The page classifier uses PyMuPDF before extraction:

text_density = len(page.get_text("text").strip()) / page.rect.area
has_images = len(page.get_images(full=True)) > 0

Rules:

Condition	Page type	Extraction strategy
text density >= 0.01 and no images	`text`	PyMuPDF text only
text density >= 0.01 and has images	`illustrative`	PyMuPDF text only
text density < 0.001 and has images	`scanned`	full-page OCR
0.001 <= text density < 0.01 and has images	`mixed`	text blocks + OCR image regions
text density < 0.001 and no images	`text`	treated as blank or near-blank text page

OCR

OCR uses Tesseract through pytesseract:

pytesseract.image_to_string(image, lang="fra+eng")

System packages:

tesseract-ocr
tesseract-ocr-eng
tesseract-ocr-fra

Language and Embedding Routing

After extraction, langdetect detects the document language:

English: all-MiniLM-L6-v2
French or any non-English language: paraphrase-multilingual-MiniLM-L12-v2

Both embedding models are loaded at startup and reused.

Retrieval

VIKA supports three retrieval modes:

dense: FAISS semantic search only.
bm25: lexical BM25 search only.
hybrid: dense + BM25 fused with Reciprocal Rank Fusion.

RRF score:

score = 1 / (60 + dense_rank) + 1 / (60 + bm25_rank)

After retrieval, the CrossEncoder reranker sorts candidates by semantic relevance before prompt injection.

Citations

The prompt and UI use original file names from data/manifest.csv instead of raw SHA-256 document IDs.

Example:

[lecture_sat_complexity.pdf p.42]

Long filenames are truncated while preserving the extension:

[this_is_a_very_long_scientific_docum....pdf p.42]

LLM Routing Architecture

flowchart LR
    A["User question"] --> B{"Routing mode"}
    B -->|"Manual"| C["Use selected model"]
    B -->|"Automatic"| D["Prompt complexity evaluation"]
    D --> E["RouteLLM router<br/>bert / sw_ranking / mf"]
    D --> F["Heuristic fallback if RouteLLM unavailable"]
    E --> G{"Simple or complex?"}
    F --> G
    G -->|"Simple"| H["Prefer simpler available models"]
    G -->|"Complex"| I["Prefer more capable available models"]
    C --> J["Provider call"]
    H --> J
    I --> J
    J --> K{"Quota / rate limit?"}
    K -->|"No"| L["Stream answer"]
    K -->|"Yes"| M["Mark model unavailable in session"]
    M --> N["Fallback in complexity-aware order"]
    N --> J

Supported LLM Providers

Provider	Models in UI	Notes
Gemini	Gemini 2.5 Flash, Flash-Lite, Pro	Google GenAI SDK
Mistral	Mistral Nemo, Ministral 3 8B, Mistral Small 4	Mistral SDK
Groq	Llama 3.3 70B, Llama 3.1 8B	Groq SDK
OpenRouter	OpenRouter GPT-OSS 120B	OpenAI `gpt-oss-120b` through OpenRouter chat completions

Model-Specific Prompt Wrapping

llm_router.py adds provider chat messages based on model style:

compact: concise answer style for smaller/faster models.
balanced: normal grounded RAG behavior.
reasoning: asks reasoning models to keep reasoning private and return only the final cited answer.

The core RAG prompt still comes from prompt_builder.py.

Quota and Fallback Handling

When a model fails with a quota, credit, rate-limit, or temporary availability error:

VIKA marks that model unavailable for the current session.
It tells the user the model cannot be used right now.
It lists available alternatives.
It falls back according to prompt complexity:
- simple prompt: simplest available to most capable
- complex prompt: most capable available to simplest

The final answer includes:

Model used: <actual model name>

Evaluation Metrics

Each query stores a session record:

{
  "query_index": 1,
  "query": "What is SAT?",
  "llm_model": "Llama 3.1 8B",
  "llm_routing_mode": "Automatic",
  "retrieval_latency_ms": 120.5,
  "generation_latency_ms": 900.2,
  "total_latency_ms": 1020.7,
  "chunks_retrieved": 20,
  "chunks_used": 5,
  "reranker_score_mean": 1.23,
  "reranker_score_min": 0.51,
  "cosine_sim_mean": 0.42,
  "bm25_contribution_pct": 60.0,
  "hit_at_5": 1.0,
  "recall_at_5": 0.8,
  "mrr": 1.0,
  "page_types_used": {"text": 5},
  "retrieval_mode": "hybrid"
}

Metric descriptions:

Metric	Meaning
Retrieval latency	Time spent in retrieval, fusion, and reranking
Generation latency	Time spent streaming from the LLM provider
Total latency	End-to-end query time
p50 latency	Median total latency for the session
p95 latency	95th percentile total latency for the session
Chunks retrieved	Candidate chunks before final prompt selection
Chunks used	Chunks inserted into the prompt
Reranker mean / min	CrossEncoder score summary for used chunks
Cosine similarity mean	Mean dense similarity between query and used chunks
BM25 contribution %	Share of used chunks that came from BM25 candidates
Hit@5	1 if at least one CrossEncoder-relevant chunk is in top 5
Recall@5	Share of CrossEncoder-relevant retrieved chunks appearing in top 5
MRR	Reciprocal rank of the first CrossEncoder-relevant chunk in top 5

Because the app does not have human relevance labels at runtime, Hit@5, Recall@5, and MRR are proxy metrics based on CrossEncoder relevance (reranker_score >= 0).

Environment Variables

Variable	Purpose
`GEMINI_API_KEY`	Google Gemini API key
`MISTRAL_API_KEY`	Mistral API key
`GROQ_API_KEY`	Groq API key
`OPENROUTER_API_KEY`	OpenRouter API key
`OPENROUTER_HTTP_REFERER`	Optional OpenRouter attribution URL
`OPENROUTER_APP_TITLE`	Optional OpenRouter app title
`VIKA_ROUTELLM_ROUTER`	Default automatic routing evaluator: `bert`, `sw_ranking`, `mf`, or `heuristic`
`VIKA_ROUTELLM_THRESHOLD`	Complexity threshold for simple vs complex routing
`VIKA_EMBED_MODEL_EN`	English embedding model
`VIKA_EMBED_MODEL_MULTI`	Multilingual embedding model
`VIKA_EMBED_BATCH_SIZE`	Embedding batch size

Tech Stack

Component	Choice
UI	Gradio
PDF parsing	PyMuPDF
OCR	Tesseract + pytesseract
Language detection	langdetect
Embeddings	sentence-transformers
Vector index	FAISS IndexFlatIP
Lexical retrieval	rank_bm25
Reranking	CrossEncoder `cross-encoder/ms-marco-MiniLM-L-6-v2`
LLM routing	RouteLLM + heuristic fallback
LLM providers	Gemini, Mistral, Groq, OpenRouter
Testing	pytest

Testing

Run:

pytest -q

The tests use synthetic PDFs and lightweight mocks. They do not require real API keys or live LLM calls.

Roadmap

Persistent vector store such as Qdrant Cloud.
Human feedback logging.
LLM-as-judge faithfulness and answer relevance metrics.
Query rewriting or HyDE.
Better layout reconstruction for complex multi-column PDFs.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
tests		tests
.gitignore		.gitignore
README.md		README.md
app.py		app.py
chunker.py		chunker.py
document_intake.py		document_intake.py
embed_faiss.py		embed_faiss.py
llm_router.py		llm_router.py
packages.txt		packages.txt
page_classifier.py		page_classifier.py
parser_utils.py		parser_utils.py
pipeline.py		pipeline.py
prompt_builder.py		prompt_builder.py
requirements.txt		requirements.txt
reranker.py		reranker.py
retriever.py		retriever.py

Folders and files

Latest commit

History

Repository files navigation

VIKA - Evaluated RAG Assistant for Scientific Documents

Live Demo

Part 1 - Concept and Presentation

What is VIKA?

Who is it for?

What problem does it solve?

User Experience

Main Capabilities

Conceptual Architecture

Why Automatic LLM Routing?

Evaluation Dashboard

Limitations

Part 2 - Technical Implementation

Runtime Constraints

Repository Structure

Detailed Pipeline

Chunk Metadata

Page Classification

OCR

Language and Embedding Routing

Retrieval

Citations

LLM Routing Architecture

Supported LLM Providers

Model-Specific Prompt Wrapping

Quota and Fallback Handling

Evaluation Metrics

Environment Variables

Tech Stack

Testing

Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages