A comprehensive RAG system featuring an independent, loosely-coupled presentation layer (Streamlit) alongside a discrete REST API (FastAPI) that share a common internal Python services backbone. This design enables the UI to remain incredibly snappy by directly hooking into databases and ingestion logic, while simultaneously allowing for robust REST communication for potential downstream systems.
- Python 3.12+ (Using
uv) - Docker & Docker Compose
If uv is not installed yet:
curl -LsSf https://astral.sh/uv/install.sh | shWindows (PowerShell):
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"Verify installation:
uv --versionRun all commands from the repository root.
After cloning, run the setup script to install all dependencies (requires uv and npm):
./setup.sh-
Create your environment file:
cp .env.example .env
-
Update
.envvalues:- Set all
SNOWFLAKE_*fields for Snowflake. - Set
GOOGLE_API_KEYfor real LLM and embedding responses. - Keep
QDRANT_URLashttp://localhost:6333for local Docker. - If
SNOWFLAKE_*values are omitted, the app uses local SQLite (vectera_local.db) as a fallback.
- Set all
-
Install dependencies:
uv sync
-
Start local vector database (Qdrant):
docker-compose up -d qdrant
-
(Optional) Seed a local admin account if auth is enabled:
uv run python -m app.scripts.seed_admin
-
Run setup checks for Snowflake, Qdrant, and LLM config:
uv run python -m app.scripts.setup_check
This validates:
- Snowflake connectivity with
SELECT 1 - Qdrant connectivity and collection readiness
- Google API key validity against the Gemini API
- Snowflake connectivity with
The primary UI entry point leverages Streamlit. It directly utilizes internal models and services (app.services.*) without executing HTTP requests internally.
uv run streamlit run app.py --server.port 8502UI will run on http://localhost:8502
If you wish to interact with Vectera programmatically or build an alternate frontend down the line, the FastAPI interface is separately available.
uv run uvicorn api:app --reload --port 8000API will run on http://localhost:8000 Swagger docs at http://localhost:8000/docs
Please see architecture.md for a comprehensive diagrammatic and structural breakdown of the application architecture.
- The system heavily relies on SQLAlchemy ORM using Snowflake for production, with a seamless fallback to a local SQLite database (
vectera_local.db). - It tracks relational metadata such as Client workspaces, Document families/versions, Vector registry metadata mappings (mapping Qdrant node IDs to physical documents), and Query Logs to enable robust document management without overloading the vector database with broad document relationship logic.
- Input documents (PDF, DOCX, PPTX) are processed natively via
LlamaIndex's specific file readers inapp/ingestion/parser.py. - During parsing, structural data (like page numbers and slide numbers) are extracted and attached to chunks.
- For non-layout-aware documents, the pipeline uses
SemanticSplitterNodeParserwith configurableSEMANTIC_SPLITTER_BREAKPOINT_PERCENTILEandSEMANTIC_SPLITTER_BUFFER_SIZE. - Layout-aware PDF ingestion keeps its specialized artifact-aware chunking path (tables/charts/figures) and skips this generic splitter stage.
- Optional pipeline enhancements include
TitleExtractorwhen API keys are available to fortify LLM metadata contexts.
Retrieval relies on a specialized, multi-stage retrieval pipeline (app/retrieval/retriever.py):
- Vector Retrieval: Starts with client-isolated
ExactMatchFilterretrieval executing against the Qdrant backend, pulling the topK+5nearest neighbors usinggemini-embedding-001. - Authority & Recency Reranking: Re-scores outputs pushing documents with higher authority indicators or newer internal version rankings to the top (
reranker.py). - Temporal Ranking: Adjusts scores logically based on explicitly resolved
effective_from/effective_todates relative to the current UTC timestamp, penalizing expired sources (temporal_ranker.py). - Citation Building: Translates standard chunk nodes into explicitly labeled citation structures that are fed directly inside the synthesized generation prompt, referencing source text precisely via
page_numandversion_label.
- Versioning is deeply ingrained.
version_resolver.pyscans filename and snippet inputs during ingestion to identify patterns corresponding to quarters (e.g., Q1_2024), years, explicit version numbers (v2), and status cues (draft/final). - Parsed documents are mapped into a unified
document_family. - When new, current variants of the same document family are uploaded, older versions are dynamically identified and marked
is_current = False. - Non-current versions are penalized during the retrieval phase but kept in the index in case historical contexts are necessary.
conflict_detector.pyoperates automatically at the end of the retrieval pipeline to flag contradictions spanning retrieved context chunks.- It clusters retrieved vectors based on the document or version groups and uses regex-based heuristic extractions across source strings to find "numeric disagreements" mapping to identical topics.
- If conflicts are identified (for instance, versions containing different numeric facts referencing similar contexts like quarterly revenue), it builds warning alerts pushed directly into the Streamlit UI, helping operators cross-check the authoritative source rather than receiving obfuscated hallucinations.
- Tables and visual chart data representation are partially handled during parsing via textual heuristic signals.
parser.pyflags specific pages/chunks with indicators (table_detected = True,chart_detected = True) by counting structured text artifacts like tabs, pipemarks (|), or references to standard diagram nomenclature ("Figure A", "graph").- These indicators are embedded as metadata for LlamaIndex pipeline filtering and context awareness but do not currently rebuild tabular markdown extraction or utilize multimodal Computer Vision reading.
- Ingestion Blocking: Document processing and parsing routines operate synchronously. Uploading large multi-hundred-page files blocks the Streamlit frontend.
- Tabular Insight Limitations: Because the system does not utilize Vision-Language Models (VLMs) or advanced OCR, complex nested tables or image-based visual charts cannot be queried effectively. It relies strictly on textual scrape artifacts.
- Conflict Regex Brittleness: The numeric extraction algorithm in
conflict_detector.pyhandles standard financial phraseology ("X grew by Y%") but misses abstract prose contradictions effectively due to the absence of dedicated LLM contradiction verification.
- Asynchronous Task Processing: Shift the ingestion pipeline (embedding, semantic extractions, vector insertions) into a Celery/Redis queue or background FastAPI task pattern to untether the UI thread.
- Multimodal Visual Embeddings: Implement LlamaIndex's vision pipelines using a multimodal Gemini model to correctly ingest visual graphs and convert bounded tables into raw markdown formats during the ingestion phase for precise layout querying.
- Advanced Cross-Encoder Reranking: Swap the basic additive metadata-heuristics reranker for a Neural Cross-Encoder logic model (like Cohere Rerank) that drastically improves top-k contextual sorting over plain embeddings without degrading performance.
Documents uploaded to Vectera move through several ingestion states: processing -> indexed (or failed).
If a document upload fails (e.g., due to an API timeout or malformed parser data), the status will be marked as failed and an error_message will be preserved in the latest IngestionJob.
- Method: Navigate to the Streamlit UI, open the Document Details panel, and click Retry Ingestion. Alternatively, trigger a POST call to
/api/v1/documents/{document_id}/retry. - Note: The original uploaded raw file must still exist in
data/raw/in order to retry successfully.
Documents can be permanently retired from the vector search space using the delete feature.
- Method: From the Document Details panel, click Delete Document and confirm. Alternatively, issue a
DELETE /api/v1/documents/{document_id}?hard=true. - Behavior: This attempts a coordinated wipe. It removes vector node points from Qdrant, drops associated metadata records and chunks from the database, removes the physical file from
data/raw/, and deletes the database record entirely. If vector deletion fails partially, the status will downgrade todeleting_failedto signal an operator.
Database Schemas and Status Tracking: We utilize an implicit string-based enum for statuses which allows for in-place workflow expansion without destructive SQL schema migrations. No manual CREATE TABLE alterations are required unless you add entirely new column properties.