Skip to content

mitangshu/Insurance-RAG-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Insurance Document RAG Pipeline

A Retrieval-Augmented Generation (RAG) system for insurance documents that allows users to ask questions about insurance policies and get accurate answers based on the content of processed documents.

Project Overview

This project implements a complete RAG pipeline for insurance documents, specifically designed to:

  1. Process and extract text from insurance PDFs (policies, brochures, etc.)
  2. Create embeddings and vector indices for efficient retrieval
  3. Find relevant document sections for user queries
  4. Generate accurate answers using an LLM with the retrieved context

Key Features

  • PDF processing with PyMuPDF (Fitz)
  • Text chunking with intelligent overlap
  • Embedding generation using Sentence Transformers
  • Efficient vector search with FAISS
  • OpenAI-powered answer generation with LangChain
  • User-friendly Streamlit interface

Project Structure

insurance-rag-assistant/
├── data/
│   ├── raw/                 # Original PDFs
│   └── processed/           # Processed text chunks
├── src/
│   ├── data_processing/     
│   │   ├── __init__.py
│   │   ├── pdf_loader.py    # PDF extraction functionality
│   │   └── text_splitter.py # Text chunking logic
│   ├── indexing/
│   │   ├── __init__.py
│   │   ├── embeddings.py    # Embedding generation
│   │   └── vector_store.py  # Vector database operations
│   ├── retrieval/
│   │   ├── __init__.py
│   │   └── retriever.py     # Similarity search logic
│   ├── llm/
│   │   ├── __init__.py
│   │   └── llm_chain.py     # LLM handling and prompting
│   └── app/
│       ├── __init__.py
│       └── main.py          # Streamlit app
├── notebooks/
│   ├── data_exploration.ipynb
│   └── prototype.ipynb
├── tests/
│   └── test_pipeline.py
├── requirements.txt
├── README.md
└── .env                     # For API keys

Installation

  1. Clone this repository:
git clone https://github.com/yourusername/insurance-rag-assistant.git
cd insurance-rag-assistant
  1. Create a virtual environment and install dependencies:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

About

A Retrieval-Augmented Generation (RAG) system for insurance documents that allows users to ask questions about insurance policies and get accurate answers based on the content of processed documents.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors