Skip to content

coderashhar/DocQuery-AI

Repository files navigation

📄 DocQuery AI

DocQuery AI is an intelligent, session-isolated RAG (Retrieval-Augmented Generation) chatbot application designed to transform how you interact with PDF documents. Instead of searching by keywords or scrolling through massive files, you can converse directly with your documents in real time.

Built with Streamlit, LangChain, Chroma DB, HuggingFace, and Mistral AI, it processes documents locally, indexes content into a fresh vector store per session, and enables instant semantic retrieval and answering.


📸 Interface Preview

DocQuery AI Interface


🚀 Key Features

  • 📂 Instant Document Processing: Upload any PDF through a clean sidebar layout to process and chunk it on the fly.
  • 🔄 Session-Isolated Database: Uses an in-memory instance of Chroma DB. Every session is fresh, isolated, and completely private—no documents are persisted on disk unless configured.
  • 🧠 Advanced Semantic Retrieval: Implements HuggingFace embeddings combined with Maximal Marginal Relevance (MMR) retrieval to extract the most relevant contexts while keeping content diverse.
  • 💬 Intelligent QA Chatbot: Leverages Mistral AI's mistral-small-latest language model to summarize, synthesize, and answer questions accurately.
  • 🧹 Single-Click Reset: Clean up your chat history and memory instantly using the "Clear Session" option.

🛠️ Tech Stack


💻 Setup & Installation

Prerequisites

1. Clone the Repository

git clone https://github.com/coderashhar/Document.AI.git
cd Document.AI

2. Configure Environment Variables

Create a .env file in the root directory and add your Mistral API key:

MISTRAL_API_KEY=your_mistral_api_key_here

3. Setup Virtual Environment

Create and activate your python virtual environment:

python3 -m venv .venv
source .venv/bin/activate  # On macOS/Linux
# .venv\Scripts\activate   # On Windows

4. Install Dependencies

pip install -r requirements.txt

5. Launch the Application

streamlit run app.py

📖 Usage Guide

  1. Open your browser and navigate to the local address provided by Streamlit (typically http://localhost:8502).
  2. Upload a PDF document in the left sidebar.
  3. Once the green success banner appears ("Document processed successfully!"), type your question into the chat input box at the bottom.
  4. Get concise, context-aware answers directly extracted from your document.
  5. Hit Clear Session if you want to upload a different PDF and start a new conversation.

About

DocQuery AI is a RAG pipeline using LangChain, HuggingFace embeddings, and an in-memory Chroma DB to perform local semantic search and text generation via the Mistral LLM API.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages