Insurance Document RAG Pipeline

A Retrieval-Augmented Generation (RAG) system for insurance documents that allows users to ask questions about insurance policies and get accurate answers based on the content of processed documents.

Project Overview

This project implements a complete RAG pipeline for insurance documents, specifically designed to:

Process and extract text from insurance PDFs (policies, brochures, etc.)
Create embeddings and vector indices for efficient retrieval
Find relevant document sections for user queries
Generate accurate answers using an LLM with the retrieved context

Key Features

PDF processing with PyMuPDF (Fitz)
Text chunking with intelligent overlap
Embedding generation using Sentence Transformers
Efficient vector search with FAISS
OpenAI-powered answer generation with LangChain
User-friendly Streamlit interface

Project Structure

insurance-rag-assistant/
├── data/
│   ├── raw/                 # Original PDFs
│   └── processed/           # Processed text chunks
├── src/
│   ├── data_processing/     
│   │   ├── __init__.py
│   │   ├── pdf_loader.py    # PDF extraction functionality
│   │   └── text_splitter.py # Text chunking logic
│   ├── indexing/
│   │   ├── __init__.py
│   │   ├── embeddings.py    # Embedding generation
│   │   └── vector_store.py  # Vector database operations
│   ├── retrieval/
│   │   ├── __init__.py
│   │   └── retriever.py     # Similarity search logic
│   ├── llm/
│   │   ├── __init__.py
│   │   └── llm_chain.py     # LLM handling and prompting
│   └── app/
│       ├── __init__.py
│       └── main.py          # Streamlit app
├── notebooks/
│   ├── data_exploration.ipynb
│   └── prototype.ipynb
├── tests/
│   └── test_pipeline.py
├── requirements.txt
├── README.md
└── .env                     # For API keys

Installation

Clone this repository:

git clone https://github.com/yourusername/insurance-rag-assistant.git
cd insurance-rag-assistant

Create a virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data/index		data/index
notebook		notebook
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
ReadMe.md		ReadMe.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Insurance Document RAG Pipeline

Project Overview

Key Features

Project Structure

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Insurance Document RAG Pipeline

Project Overview

Key Features

Project Structure

Installation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages