Medical QA RAG System

Overview

A RAG-based question answering system for medical guidelines documentation, implemented within a 4-hour timeframe. The system processes the ICD-10 Classification of Mental and Behavioural Disorders (9241544228_eng.pdf, 377 pages) to provide coded classifications for diagnoses.

Tools and Models Used

Embeddings: all-MiniLM-L6-v2 from Sentence Transformers
Vector DB: FAISS (CPU version, IndexFlatIP)
LLM: LLaMA3 via Ollama
Text Processing: PyPDF2, LangChain RecursiveCharacterTextSplitter
Interface: Gradio

System Components

File	Description
`document_processor.py`	PDF text extraction and medical-optimized chunking (500 tokens, 100 overlap)
`vector_store.py`	FAISS index management with ICD-10 code boosting
`rag_pipeline.py`	Core RAG logic with query caching via lru_cache
`interface.py`	Gradio web interface
`main.py`	Entry point with system initialization and document ingestion

Key Design Decisions

Chunking Strategy:
- 500-token chunks with 100-token overlap to retain context in a 377-page medical text with dense descriptions and ICD-10 codes.
- Medical-aware separators (e.g., bullet points, section breaks, periods) to preserve clinical structure.
- Line break normalization to handle PDF formatting artifacts.
Retrieval Optimization:
- Boosted search width for ICD-10 code queries
- Cosine similarity with normalized embeddings
LLM Interaction:
- Dynamic context window (k=3 for short queries, k=5 for complex ones) based on query length.
- Temperature=0.2 for factual, concise responses.
- Strict context-bound answers with a prompt limiting responses to retrieved chunks.
Caching:
- Implemented LRU cache for frequent queries
- Assumed repeated queries would be common in clinical use

Sample Output

Test Query:

Give me the correct coded classification for: "Recurrent depressive disorder, currently in remission"

Output:

The correct coded classification is:

F33.4 Recurrent depressive disorder, currently in remission

Limitations

Document Scope:
- Processes only the provided 9241544228_eng.pdf (ICD-10 guidelines, 377 pages); no dynamic document uploads or external data integration.
Code Coverage:
- No Maximal Marginal Relevance (MMR) reranking implemented due to 4-hour time constraint.
- Error handling is limited to generic messages (e.g., "System error. Please try again") with basic logging; lacks specific feedback for edge cases like missing chunks or LLM failures.
Performance:
- Initial query latency ~15-30 seconds due to embedding generation and LLM inference for a 377-page document.
- Requires local Ollama setup with LLaMA3 pre-loaded for operation.

AI Tools Usage

ChatGPT:
- Used for debugging FAISS index configuration
- Suggested optimization for medical text preprocessing regex patterns
Grok AI:
- Assisted in designing the prompt engineering strategy for Llama3
- For Interface
- Recommended the chunk size (500 chars) based on medical text characteristics
- Provided insights on error handling patterns for Medical QA RAG system

How to Run

# Clone the Repo
git clone https://github.com/prashanth1276/Medical-QA-RAG-System.git
cd Medical-QA-RAG-System

# Set up environment
conda create -p venv python==3.12   #Windows
conda activate venv/    # Windows

# Install all the Dependencies
pip install -r requirements.txt

# First Install Ollama in your system
# Get llama3 (optional)
ollama pull llama3

# Launch system
python main.py

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical QA RAG System

Overview

Tools and Models Used

System Components

Key Design Decisions

Sample Output

Limitations

AI Tools Usage

How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Medical QA RAG System

Overview

Tools and Models Used

System Components

Key Design Decisions

Sample Output

Limitations

AI Tools Usage

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages