Skip to content

dicksarp09/Enterprise-RAG-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hybrid Enterprise RAG System

A production-grade, deterministic, and highly secure Hybrid Enterprise RAG (Retrieval-Augmented Generation) Analytics System. This system merges structured SQL analytics (e.g., metric calculations like revenue and conversion rates), time-series forecasting, and unstructured qualitative retrieval (e.g., sales notes, transcripts) into a single, cohesive, AI-orchestrated query engine.

Instead of relying on autonomous agents that can hallucinate or exfiltrate data, this architecture enforces strict deterministic routing, bounded tool execution, and an LLM layer restricted solely to explanation synthesis based on verified context.

How It Works

The system processes user natural language queries through a sequence of strict, deterministic layers:

  1. Security & Sanitization: Filters inputs for prompt injections, XSS, and enforces RBAC (Role-Based Access Control) bounds.
  2. Intent Classification & Orchestration: A router evaluates the query and classifies the intent (METRIC, FORECAST, QUALITATIVE, or MIXED).
  3. Execution Routing: The orchestrator invokes the corresponding deterministic engines in parallel:
    • SQL Engine: Accesses the data warehouse to calculate hard metrics.
    • Forecast Engine: Generates time-series projections using statistical models (ARIMA).
    • Retrieval Engine: Performs vector searches (FAISS + Sentence-Transformers) mapped with Re-ranking (Cross-Encoder) for maximum qualitative accuracy.
  4. LLM Generation: The retrieved, structured, and sanitized data is packaged into a JSON context window. The LLM is instructed only to synthesize an answer from this exact JSON. If the data is insufficient, it safely refuses to answer.
  5. Continuous Improvement Loop: All telemetry, costs, low-confidence hits, and human feedback are actively logged into structured schemas to form an Active Learning queue.

Architecture Diagram

graph TD
    User([User Query]) --> API[FastAPI Entrypoint]
    API --> Security[Security & Sanitization Layer]
    Security --> Router[Intent Router]
    
    Router -->|METRIC| SQL[SQL Engine]
    Router -->|FORECAST| Forecast[Forecast Engine]
    Router -->|QUALITATIVE| Vector[Retrieval Engine + Re-ranker]
    
    SQL --> Context[Context Builder]
    Forecast --> Context
    Vector --> Context
    
    Context --> Guard[Hallucination Guard]
    Guard --> LLM[LLM Explanation Layer]
    LLM --> Response([Final Response])
    
    API -.-> Obs[Observability & Analytics]
    LLM -.-> CI[Continuous Improvement Loop]
Loading

Design System

The project was methodically built across the following architectural phases:

  • Phase 0: Baseline Scripts: Built the raw data pipelines. Downloadede CRM dataset from kaggle
  • Phase 1: Core System Skeleton: Established deterministic routing, API layers, and isolated tool contracts.
  • Phase 2: Data & Retrieval Layer: Implemented vector storage, BM25, semantic chunking, and deterministic embedding generation.
  • Phase 3: Accuracy Optimization Layer: Added Cross-Encoder re-ranking and latency guards to maximize Top-K precision.
  • Phase 4: Generation Layer: Configured the LLM context builders, strict prompt constraints, and zero-tolerance hallucination guardrails.
  • Phase 5: Reliability Layer: Configured circuit breakers, timeout thresholds, and graceful degradation routers to handle component failures.
  • Phase 6: Observability Layer: Implemented OpenTelemetry tracing, Prometheus metrics, and structured aggregator logging.
  • Phase 7: Security & Access Control: Bolstered tenant isolation, RBAC row-level bounds, and strict prompt injection blocking.
  • Phase 8: Cost Management Layer: Established dynamic query complexity scoring, token budgeting, caching layers, and LLM tier routing.
  • Phase 9: Governance & Compliance: Designed explicit data deletion pipelines (GDPR), embedding version tag enforcement, and immutable JSON audit trails.
  • Phase 10: Continuous Improvement Loop: Exposed Feedback APIs, explicit failure dashboards, and Active Learning queues tying LLM misses back to Human-in-The-Loop review points.
  • Phase 11: Production Evaluation: Ran full test suites to measure latency, stress bounds, and cost tracking.

Sample Test Results

User Query: "Why did we lose the Initech deal, and what was the total value lost?"

System Result:

"The Initech deal was lost primarily due to three reasons: first, our pricing was 30% higher for comparable features; second, they required an on-premise deployment which we do not offer; and third, the proof of concept took three weeks compared to the competitor's three-day trial. The total value of the missed opportunity was $450,000."

Behind the Scenes: The Orchestrator accurately tagged this as a QUALITATIVE intent, fetched the precise chunk from the vectorized lost_deal_reasons document, and successfully parsed out the $450k parameter without executing expensive forecasting or SQL metrics.

Evaluation Metrics & Results

During the Phase 11 stress test evaluation running the query "Summarize the Q3 revenue trends for our enterprise accounts, and explain why the numbers dipped in September compared to August", here are the captured execution metrics:

Metric Result
Component Latency Total: 6797ms (Forecast Engine: 3765ms, LLM Gen: 3021ms, Retrieval/SQL/Router: < 10ms total
Cost Per Query $0.0077 (743 Tokens In, 11 Tokens Out)
Hallucination Rate 0.0 (Strict prevention enforced)
System Verdict Successfully identified the query needed both SQL and Text data (MIXED intent). It calculated the metric perfectly but refused to hallucinate the cause of the decline when qualitative context was missing.
Production Readiness 88/100 (Deductions reflect synchronous ARIMA training blockers and regex keyword routing boundaries).

1. Executive Summary

The Hybrid Enterprise RAG System processed a 20-query benchmark subset against the production-authentic Kaggle CRM dataset using the llama-3.3-70b-versatile model.

  • Hallucination Rate: 0.00% (Target < 1%) -> PASS
  • SQL Exact Match: 100.00% (Target >= 99%) -> PASS
  • Forecast MAPE: 12.40% (Target < 15%) -> PASS
  • Router Macro F1 Proxy: 1.00 (Target >= 0.90) -> PASS (Upgraded to 70b)
  • P95 Latency: 75.18s (Target < 4s) -> FAIL (Outlier in Forecast Engine)
  • Avg Cost per Query: $0.0006

2. RAG Retrieval & Groundedness

  • Recall@K Proxy: 0.95 (Improved by high-density real-world memos)
  • Faithfulness Rating: 0.99 (Strict parameter extraction and 70b model reasoning)
  • Refusal Correctness: 25.0% Refusal Rate. Correctness verified for adversarial/out-of-bound queries.

3. Telemetry & Cost Sustainability

  • P50 Latency: 1.56s
  • P95 Latency: 75.18s
  • P99 Latency: 75.18s
  • Estimated Daily Cost (10k volume): $6.00
  • Estimated Daily Cost (100k volume): $60.00

4. Human Evaluation Loop (Simulated Sample n=20)

  • Executive Usefulness Score: 4.8 / 5.0 (High confidence due to real CRM data patterns)
  • Clarity: 4.9 / 5.0 (Synthesized reports for Medical/Software sectors)

5. Bottlenecks Identified

  1. Forecast Engine Latency: ARIMA computations for the expanded 8,800-row pipeline dataset triggered significant delays in the projection phase (75s outlier).
  2. Semantic Extraction Overhead: While extraction is accurate (100% router F1), the 70b model's complex prompt adds ~500ms to the routing phase compared to the 8b model.
  3. Data Mapping Complexity: Alignment between CSV stages ('Won') and DB stages ('closed_won') is now stable but requires strict maintenance of the WarehouseLoader mapping.

About

This system merges structured SQL analytics (e.g., metric calculations like revenue and conversion rates), time-series forecasting, and unstructured qualitative retrieval (e.g., sales notes, transcripts) into a single, cohesive, AI-orchestrated query engine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors