GitHub - InternalIQ/backend: An internal engineering knowledge assistant capable of indexing PDF corpora, retrieving grounded evidence, drafting answers, and verifying responses before returning them to end users.

Agentic Retrieval-Augmented Generation (RAG) service built with FastAPI, LangChain/LangGraph, Azure OpenAI, and a hybrid Chroma + BM25 retrieval stack. It powers InternalIQ, an internal engineering knowledge assistant capable of indexing PDF corpora, retrieving grounded evidence, drafting answers, and verifying responses before returning them to end users.

▶️ Watch Demo Video

Table of Contents

Why InternalIQ
Key Capabilities
High-Level Architecture
Repository Layout
Prerequisites
Setup
Environment Variables
Running the API
Document Ingestion
Querying the Assistant
Development Workflow
Troubleshooting

Why InternalIQ

Engineering organizations accumulate large amounts of PDF documentation (governance, operating procedures, systems knowledge). InternalIQ closes the discovery gap by:

Converting PDFs to structured markdown via Docling, chunking semantically, and indexing into Chroma and BM25.
Running a LangGraph-powered agent workflow that retrieves, checks relevance, drafts answers, and verifies grounding before responding.
Providing a minimal FastAPI surface for ingestion (POST /documents) and querying (POST /query).

Key Capabilities

Hybrid retrieval – HybridRetriever fuses dense embeddings (Chroma) with sparse BM25 scores for robust recall.
Deterministic agent workflow – AgentWorkflow orchestrates retrieve → relevance gate → research → verification with retry loops.
LLM-driven guardrails – Specialized agents (relevance, research, verification) run on Azure OpenAI deployments to ensure grounded, concise answers.
Incremental indexing – DocumentIndexer leverages Docling loaders, semantic chunking, and an SHA-256 IndexCache to avoid re-embedding unchanged PDFs.
Persistent vector store – VectorStore persists embeddings in ./chroma_db, enabling warm restarts and offline updates.
Category-aware data lake – Documents live under data/<category>/... so teams can scope ingestion per domain (systems, operations, governance, etc.).

High-Level Architecture

Data flow:

PDFs dropped into data/ (or uploaded via /documents) are converted to markdown (loaders), chunked (chunking), and embedded.
Hybrid retrieval fetches the most relevant documents for each query.
The Relevance Agent (relevance_agent.py) checks whether the query is relevant to the retrieved documents and whether an answer can be generated from them; queries failing this gate short-circuit to save tokens.
Research and Verification agents iterate to refine and ground the answer, or exhaust retry budget if the documents cannot support a response.

Repository Layout

app/
	agents/           # LLM-powered relevance, research, verification agents
	ingestion/        # PDF loading, markdown chunking, index cache, document indexer
	orchestration/    # LangGraph AgentState + workflow definition
	retrieval/        # Hybrid retriever, Chroma vector store, BM25 index
	main.py           # FastAPI entrypoint exposing /query and /documents
chroma_db/          # Persisted Chroma collections
data/               # Domain-organized PDF corpus
requirements.txt    # Python dependency lock

Prerequisites

Python 3.11+ (tested on 3.11.x)
Poetry or pip/venv (instructions below assume pip)
Azure OpenAI resource with:
- Chat completion deployment for the agents
- Embedding deployment for Chroma ingestion
Local install of poppler/libmagic if Docling requires native deps (platform specific)

Setup

# 1. Clone and enter the repo
git clone <repo-url>
cd backend

# 2. Create & activate a virtual environment
python -m venv .venv
source .venv/Scripts/activate        # PowerShell: .venv\Scripts\Activate.ps1

# 3. Install Python dependencies
pip install --upgrade pip
pip install -r requirements.txt

# 4. Create the runtime directories
mkdir -p data engineering_governance engineering_operations
mkdir -p chroma_db .index_cache

# 5. Provide environment variables (see below)
cp .env.example .env  # create and fill in values

Environment Variables

Populate a .env file at repo root (parsed via app/config.py).

Variable	Description
`API_KEY`	Azure OpenAI API key with access to both chat + embedding deployments.
`AZURE_EMBEDDING_ENDPOINT`	Base URL of the embedding Azure OpenAI resource (`https://<resource>.openai.azure.com`).
`AZURE_EMBEDDING_DEPLOYMENT`	Embedding deployment name used by Chroma (`text-embedding-3-large`, etc.).
`LLM_ENDPOINT`	Base URL for the chat completion Azure OpenAI resource.
`LLM_DEPLOYMENT`	Deployment name for the chat model powering all three agents.

Tip: Keep embeddings and chat deployments in the same resource when possible to simplify networking and private link configuration.

Running the API

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

POST /query – Accepts a JSON body like { "query": "How do we roll out change requests?" } and returns an object containing the drafted answer, verification verdict, context excerpts, and relevance label.
POST /documents – Accepts multipart form-data with file=<PDF> and category=<folder> to index new material on the fly.

OpenAPI docs and an interactive Swagger UI live at http://localhost:8000/docs when the server is running.

Document Ingestion

1. File drop (offline indexing)

Place PDFs under data/<category>/your-doc.pdf.
Start the API once; DocumentIndexer performs a recursive crawl and indexes anything not cached.

2. API ingestion (online)

curl -X POST "http://localhost:8000/documents" \
	-F "file=@data/engineering_systems/system-architecture.pdf" \
	-F "category=engineering_systems"

Non-PDF uploads receive a 400 error.
Indexed docs are hashed and tracked via .index_cache/indexed_docs.json to avoid re-embedding duplicates.

Querying the Assistant

curl -X POST "http://localhost:8000/query" \
	-H "Content-Type: application/json" \
	-d '{"query": "What is the SEV escalation process?"}'

Successful responses resemble:

{
  "answer": "SEV escalations follow a three-stage process...",
  "verification": {
    "Supported": "YES",
    "Unsupported Claims": [],
    "Contradictions": [],
    "Relevant": "YES"
  },
  "context_used": "[Document 1 | governance.pdf] ...",
  "relevance_label": "CAN_ANSWER"
}

If relevance fails (NO_MATCH), the workflow short-circuits to save tokens.

Development Workflow

Hot reload – uvicorn app.main:app --reload during development.
Formatting/Linting – Configure your preferred toolset (e.g., ruff or black).
Tests – Add pytest suites under tests/ (not yet included) to cover ingestion, retrieval heuristics, and agent mocks.
Observability – Logging is already structured; wire in OpenTelemetry exporters if you need tracing.

Troubleshooting

Missing punkt tokenizer errors → let NLTK download automatically or pre-install via python -m nltk.downloader punkt punkt_tab.
FileNotFoundError: data → ensure the data/ directory exists before starting the API.
Empty answers → verify embeddings + chat deployments are correctly configured and accessible from your environment.
Chroma schema issues → delete ./chroma_db to force a clean rebuild (will re-embed all content).

Need help or have ideas? File an issue or start a discussion – contributions are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
app		app
data		data
public		public
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Why InternalIQ

Key Capabilities

High-Level Architecture

Repository Layout

Prerequisites

Setup

Environment Variables

Running the API

Document Ingestion

1. File drop (offline indexing)

2. API ingestion (online)

Querying the Assistant

Development Workflow

Troubleshooting

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Why InternalIQ

Key Capabilities

High-Level Architecture

Repository Layout

Prerequisites

Setup

Environment Variables

Running the API

Document Ingestion

1. File drop (offline indexing)

2. API ingestion (online)

Querying the Assistant

Development Workflow

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages