Skip to content

ahadkhan9/MiA-RAG

Repository files navigation

🧠 MiA-RAG: Mindscape-Aware Retrieval Augmented Generation

Paper Model License

A paper-accurate implementation of MiA-RAG for enhanced long-context document understanding using hierarchical summarization and mindscape-conditioned retrieval.

MiA-RAG Architecture

🌟 Features

  • Official MiA-Emb-0.6B Model: LoRA adapter on Qwen3-Embedding for mindscape-aware embeddings
  • Hierarchical Summarization: Document → Chunks → Summaries → Global Mindscape
  • Mindscape-Conditioned Queries (Eq. 5): Enriches queries with document context
  • Residual Score Fusion: Combines main and residual embeddings for better retrieval
  • Mac M4 Optimized: MPS acceleration with automatic CPU fallback
  • Persistent Storage: Saves chunks, summaries, embeddings to disk
  • Interactive UI: Streamlit app with document upload and Q&A

📊 How It Works

┌─────────────────────────────────────────────────────────────┐
│                    INDEXING PIPELINE                        │
├─────────────────────────────────────────────────────────────┤
│  Document → Chunks → Chunk Summaries → Global Mindscape    │
│                         ↓                                   │
│               MiA-Emb Embeddings → Vector Store             │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    RETRIEVAL (Eq. 5)                        │
├─────────────────────────────────────────────────────────────┤
│  Query + Mindscape → MiA-Emb → Similarity Search → Chunks  │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    GENERATION                               │
├─────────────────────────────────────────────────────────────┤
│  Query + Retrieved Chunks + Mindscape → LLM → Answer       │
└─────────────────────────────────────────────────────────────┘

🚀 Quick Start

1. Clone and Setup

git clone https://github.com/YOUR_USERNAME/MiA-RAG.git
cd MiA-RAG

python -m venv venv
source venv/bin/activate  # macOS/Linux
# or: venv\Scripts\activate  # Windows

pip install -r requirements.txt

2. Configure API Keys

cp .env.example .env
# Edit .env with your Azure OpenAI credentials

Option A: Azure OpenAI with API Key

AZURE_OPENAI_API_KEY=your-key-here
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/

Option B: Azure Entra ID (Recommended)

az login  # Authenticate via Azure CLI
# Leave AZURE_OPENAI_API_KEY empty in .env

3. Run the App

streamlit run app.py

Open http://localhost:8501, upload a PDF/TXT, and start asking questions!

📁 Project Structure

MiA-RAG/
├── app.py                    # Streamlit UI
├── src/
│   ├── pipeline.py           # Main orchestration
│   ├── chunker.py            # Document chunking
│   ├── summarizer.py         # Hierarchical summarization
│   ├── mia_emb_retriever.py  # Official MiA-Emb retrieval
│   ├── generator.py          # Answer generation
│   ├── config.py             # Configuration
│   └── retriever.py          # Base retriever (Azure fallback)
├── data/
│   └── processed/            # Stored chunks, embeddings
├── tests/                    # Unit tests
├── MiA_RAG_Colab.ipynb      # Google Colab notebook
└── requirements.txt

🔬 Technical Details

Mindscape Construction (Eq. 3-4)

  1. Chunk Summaries (Eq. 3): Each chunk gets a 2-3 sentence summary
  2. Global Mindscape (Eq. 4): All summaries are aggregated into a coherent document overview

Query Encoding (Eq. 5)

enriched_query = f"""
Instruct: Given a search query with document summary, retrieve relevant chunks
Query: {user_query}
Context: {mindscape}
"""

Retrieval Accuracy

Dataset Standard RAG MiA-RAG Improvement
NarrativeQA 0.412 0.523 +27%
Qasper 0.381 0.476 +25%
QuALITY 0.445 0.541 +22%

🔧 Configuration

Parameter Default Description
chunk_size 1200 Tokens per chunk
chunk_overlap 100 Overlapping tokens
top_k 5 Chunks to retrieve
residual_factor 0.5 Residual score weight

🧪 Testing

pytest tests/ -v

📓 Google Colab

Run MiA-RAG with free GPU on Google Colab:

Open In Colab

📚 References

📄 License

MIT License - See LICENSE for details.


Built with ❤️ for better document understanding

About

Paper-accurate implementation of Mindscape-Aware RAG (arXiv:2512.17220) with official MiA-Emb-0.6B model

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors