🧠 ResearchMind

A Multi-Source AI Research Assistant — Drop in PDFs, web URLs, or plain text, then ask questions and get cited answers.

Like having a research assistant who has read all your documents.

🏗️ Architecture

graph LR
    A[PDF / URL / Text] --> B[data_loader.py]
    B --> C[chunking.py]
    C --> D[embedding_pipeline.py]
    D --> E[(ChromaDB)]
    F[User Query] --> G[search.py]
    G --> E
    G --> H[generator.py]
    H --> I[Cited Answer]
    J[app.py — Streamlit] --> B
    J --> F

✨ Features

Multi-format ingestion — PDFs, web URLs, and plain text in one interface
Inline citations — Every answer references the exact source chunks used
Persistent vector store — Documents survive app restarts (ChromaDB)
Streaming responses — Real-time token generation via Groq API
MMR search — Maximal Marginal Relevance for diverse, non-redundant results
Document summarization — One-click summary across all loaded sources
Chat history — Conversational context maintained across questions

🛠️ Tech Stack

Component	Technology
Document Loading	PyMuPDF, BeautifulSoup
Chunking	LangChain Text Splitters
Embeddings	Sentence Transformers (`all-MiniLM-L6-v2`)
Vector Store	ChromaDB (persistent)
LLM	Groq API (Llama 3.3 70B)
UI	Streamlit
Containerization	Docker

🚀 Quick Start

Local Development

# 1. Clone
git clone https://github.com/yourusername/ResearchMind.git
cd ResearchMind

# 2. Create virtual environment
python -m venv .venv
.venv\Scripts\activate     # Windows
# source .venv/bin/activate  # macOS/Linux

# 3. Install dependencies
pip install -r requirements.txt

# 4. Set API key
cp .env.example .env
# Edit .env and add your Groq API key (free at https://console.groq.com)

# 5. Run
streamlit run app.py

Open http://localhost:8501 in your browser.

Docker

docker-compose up --build

📁 Project Structure

ResearchMind/
├── app.py                    # Streamlit UI
├── config.py                 # Centralized configuration
├── src/
│   ├── data_loader.py        # PDF, URL, text ingestion
│   ├── chunking.py           # Fixed-size & semantic splitting
│   ├── embedding_pipeline.py # Sentence Transformers + ChromaDB
│   ├── search.py             # Similarity + MMR retrieval
│   └── generator.py          # Groq LLM streaming + citations
├── .streamlit/
│   ├── config.toml           # Streamlit server config
│   └── theme.toml            # Dark theme settings
├── Dockerfile                # Container image
├── docker-compose.yml        # Local Docker setup
├── render.yaml               # Render deployment blueprint
├── requirements.txt          # Python dependencies
└── .env.example              # API key template

📖 How It Works

Ingest — Upload PDFs, paste URLs, or type text. Each source is parsed with metadata (page numbers, titles, URLs).
Chunk — Documents are split into overlapping chunks (1000 chars, 200 overlap) to fit embedding context windows.
Embed — Chunks are encoded with all-MiniLM-L6-v2 and stored in ChromaDB.
Retrieve — User queries are embedded and matched against stored chunks using cosine similarity or MMR.
Generate — Top-K chunks are injected into a prompt sent to Groq's Llama 3.3 70B, which generates a cited answer.

📄 License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 ResearchMind

🏗️ Architecture

✨ Features

🛠️ Tech Stack

🚀 Quick Start

Local Development

Docker

📁 Project Structure

📖 How It Works

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.streamlit		.streamlit
src		src
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
config.py		config.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
render.yaml		render.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 ResearchMind

🏗️ Architecture

✨ Features

🛠️ Tech Stack

🚀 Quick Start

Local Development

Docker

📁 Project Structure

📖 How It Works

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages