Skip to content

InternalIQ/backend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

102 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Header Image

Agentic Retrieval-Augmented Generation (RAG) service built with FastAPI, LangChain/LangGraph, Azure OpenAI, and a hybrid Chroma + BM25 retrieval stack. It powers InternalIQ, an internal engineering knowledge assistant capable of indexing PDF corpora, retrieving grounded evidence, drafting answers, and verifying responses before returning them to end users.

▶️ Watch Demo Video

Table of Contents

Why InternalIQ

Engineering organizations accumulate large amounts of PDF documentation (governance, operating procedures, systems knowledge). InternalIQ closes the discovery gap by:

  • Converting PDFs to structured markdown via Docling, chunking semantically, and indexing into Chroma and BM25.
  • Running a LangGraph-powered agent workflow that retrieves, checks relevance, drafts answers, and verifies grounding before responding.
  • Providing a minimal FastAPI surface for ingestion (POST /documents) and querying (POST /query).

Key Capabilities

  • Hybrid retrievalHybridRetriever fuses dense embeddings (Chroma) with sparse BM25 scores for robust recall.
  • Deterministic agent workflowAgentWorkflow orchestrates retrieve → relevance gate → research → verification with retry loops.
  • LLM-driven guardrails – Specialized agents (relevance, research, verification) run on Azure OpenAI deployments to ensure grounded, concise answers.
  • Incremental indexingDocumentIndexer leverages Docling loaders, semantic chunking, and an SHA-256 IndexCache to avoid re-embedding unchanged PDFs.
  • Persistent vector storeVectorStore persists embeddings in ./chroma_db, enabling warm restarts and offline updates.
  • Category-aware data lake – Documents live under data/<category>/... so teams can scope ingestion per domain (systems, operations, governance, etc.).

High-Level Architecture

InternalIQ Architecture

Data flow:

  1. PDFs dropped into data/ (or uploaded via /documents) are converted to markdown (loaders), chunked (chunking), and embedded.
  2. Hybrid retrieval fetches the most relevant documents for each query.
  3. The Relevance Agent (relevance_agent.py) checks whether the query is relevant to the retrieved documents and whether an answer can be generated from them; queries failing this gate short-circuit to save tokens.
  4. Research and Verification agents iterate to refine and ground the answer, or exhaust retry budget if the documents cannot support a response.

Repository Layout

app/
	agents/           # LLM-powered relevance, research, verification agents
	ingestion/        # PDF loading, markdown chunking, index cache, document indexer
	orchestration/    # LangGraph AgentState + workflow definition
	retrieval/        # Hybrid retriever, Chroma vector store, BM25 index
	main.py           # FastAPI entrypoint exposing /query and /documents
chroma_db/          # Persisted Chroma collections
data/               # Domain-organized PDF corpus
requirements.txt    # Python dependency lock

Prerequisites

  • Python 3.11+ (tested on 3.11.x)
  • Poetry or pip/venv (instructions below assume pip)
  • Azure OpenAI resource with:
    • Chat completion deployment for the agents
    • Embedding deployment for Chroma ingestion
  • Local install of poppler/libmagic if Docling requires native deps (platform specific)

Setup

# 1. Clone and enter the repo
git clone <repo-url>
cd backend

# 2. Create & activate a virtual environment
python -m venv .venv
source .venv/Scripts/activate        # PowerShell: .venv\Scripts\Activate.ps1

# 3. Install Python dependencies
pip install --upgrade pip
pip install -r requirements.txt

# 4. Create the runtime directories
mkdir -p data engineering_governance engineering_operations
mkdir -p chroma_db .index_cache

# 5. Provide environment variables (see below)
cp .env.example .env  # create and fill in values

Environment Variables

Populate a .env file at repo root (parsed via app/config.py).

Variable Description
API_KEY Azure OpenAI API key with access to both chat + embedding deployments.
AZURE_EMBEDDING_ENDPOINT Base URL of the embedding Azure OpenAI resource (https://<resource>.openai.azure.com).
AZURE_EMBEDDING_DEPLOYMENT Embedding deployment name used by Chroma (text-embedding-3-large, etc.).
LLM_ENDPOINT Base URL for the chat completion Azure OpenAI resource.
LLM_DEPLOYMENT Deployment name for the chat model powering all three agents.

Tip: Keep embeddings and chat deployments in the same resource when possible to simplify networking and private link configuration.

Running the API

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
  • POST /query – Accepts a JSON body like { "query": "How do we roll out change requests?" } and returns an object containing the drafted answer, verification verdict, context excerpts, and relevance label.
  • POST /documents – Accepts multipart form-data with file=<PDF> and category=<folder> to index new material on the fly.

OpenAPI docs and an interactive Swagger UI live at http://localhost:8000/docs when the server is running.

Document Ingestion

1. File drop (offline indexing)

  1. Place PDFs under data/<category>/your-doc.pdf.
  2. Start the API once; DocumentIndexer performs a recursive crawl and indexes anything not cached.

2. API ingestion (online)

curl -X POST "http://localhost:8000/documents" \
	-F "file=@data/engineering_systems/system-architecture.pdf" \
	-F "category=engineering_systems"
  • Non-PDF uploads receive a 400 error.
  • Indexed docs are hashed and tracked via .index_cache/indexed_docs.json to avoid re-embedding duplicates.

Querying the Assistant

curl -X POST "http://localhost:8000/query" \
	-H "Content-Type: application/json" \
	-d '{"query": "What is the SEV escalation process?"}'

Successful responses resemble:

{
  "answer": "SEV escalations follow a three-stage process...",
  "verification": {
    "Supported": "YES",
    "Unsupported Claims": [],
    "Contradictions": [],
    "Relevant": "YES"
  },
  "context_used": "[Document 1 | governance.pdf] ...",
  "relevance_label": "CAN_ANSWER"
}

If relevance fails (NO_MATCH), the workflow short-circuits to save tokens.

Development Workflow

  • Hot reloaduvicorn app.main:app --reload during development.
  • Formatting/Linting – Configure your preferred toolset (e.g., ruff or black).
  • Tests – Add pytest suites under tests/ (not yet included) to cover ingestion, retrieval heuristics, and agent mocks.
  • Observability – Logging is already structured; wire in OpenTelemetry exporters if you need tracing.

Troubleshooting

  • Missing punkt tokenizer errors → let NLTK download automatically or pre-install via python -m nltk.downloader punkt punkt_tab.
  • FileNotFoundError: data → ensure the data/ directory exists before starting the API.
  • Empty answers → verify embeddings + chat deployments are correctly configured and accessible from your environment.
  • Chroma schema issues → delete ./chroma_db to force a clean rebuild (will re-embed all content).

Need help or have ideas? File an issue or start a discussion – contributions are welcome!

About

An internal engineering knowledge assistant capable of indexing PDF corpora, retrieving grounded evidence, drafting answers, and verifying responses before returning them to end users.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages