| title | RAG Knowledge Base | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| emoji | 💬 | ||||||||||
| colorFrom | blue | ||||||||||
| colorTo | indigo | ||||||||||
| sdk | docker | ||||||||||
| pinned | false | ||||||||||
| tags |
|
Pre-indexed AI knowledge base for customer support — ask questions about your product without uploading files. Drop your PDFs in
/docs, index once, answer forever. ⚡
- 📁 Pre-indexed knowledge base — no file upload needed from the user
- 🔍 Semantic search over your documents via ChromaDB
- ⚡ Ultra-fast responses powered by Groq (llama-3.3-70b)
- 💬 Chat interface with conversation history and source display
- 🛠️ Admin ingestion script — run once to index all your PDFs
- 🐳 Docker ready for easy deployment
Admin (you) End User
│ │
▼ ▼
Drop PDFs in /docs Asks questions
│ │
▼ ▼
python src/ingest.py → streamlit run src/app.py
│ │
▼ ▼
ChromaDB (indexed) ← RAG Pipeline (Groq + LangChain)
| Component | Tool | Role |
|---|---|---|
| LLM | Groq (llama-3.3-70b-versatile) | Answer generation |
| Embeddings | HuggingFace (all-MiniLM-L6-v2) | Text vectorization |
| Vector Store | ChromaDB | Semantic search |
| Orchestration | LangChain | RAG pipeline |
| UI | Streamlit | Chat interface |
- Python 3.11+
- A Groq API key (free)
- Git
git clone https://github.com/salmazenn/rag-knowledge-base.git
cd rag-knowledge-basepython3 -m venv .venv-kb
source .venv-kb/bin/activate # Mac/Linux
# .venv-kb\Scripts\activate # Windowspip install -r requirements.txtOr install manually:
pip install langchain-community \
langchain-groq \
langchain-huggingface \
langchain-chroma \
langchain-text-splitters \
chromadb \
pypdf \
streamlit \
sentence-transformers \
python-dotenv \
numpyCreate a .env file at the root of the project:
cp .env.example .envThen edit .env with your values:
# Required
GROQ_API_KEY=your_groq_api_key_here
# Optional — defaults shown
GROQ_MODEL=llama-3.3-70b-versatile
CHROMA_PERSIST_DIR=./data/chroma
DOCS_DIR=./docs
CHUNK_SIZE=1000
CHUNK_OVERLAP=100
TOP_K=8Drop your PDF files into the /docs folder:
cp your-faq.pdf docs/
cp your-product-guide.pdf docs/
# Add as many PDFs as neededpython src/ingest.pyYou should see:
🚀 Starting ingestion...
📁 Source folder: ./docs
💾 Vector store: ./data/chroma
📄 Loading: your-faq.pdf
✅ 12 pages loaded
✂️ 163 chunks created
🔍 Creating embeddings (all-MiniLM-L6-v2)...
💾 163 chunks saved in: ./data/chroma
🎉 Ingestion complete! 163 chunks indexed and ready.
👉 Now run: streamlit run src/app.py
streamlit run src/app.pyOpen http://localhost:8501 in your browser. 🎉
| Variable | Default | Description |
|---|---|---|
GROQ_API_KEY |
— | Required. Your Groq API key |
GROQ_MODEL |
llama-3.3-70b-versatile |
Groq model to use |
CHROMA_PERSIST_DIR |
./data/chroma |
ChromaDB storage path |
DOCS_DIR |
./docs |
Folder containing your PDFs |
CHUNK_SIZE |
1000 |
Size of each text chunk (tokens) |
CHUNK_OVERLAP |
100 |
Overlap between consecutive chunks |
TOP_K |
8 |
Number of chunks retrieved per query |
- Increase
CHUNK_SIZE(e.g. 1500) for documents with long paragraphs - Increase
TOP_K(e.g. 10-12) for broad questions requiring more context - Decrease
TOP_K(e.g. 4) for precise factual questions — faster and more accurate - Re-run
ingest.pyevery time you add or modify documents in/docs
rag-knowledge-base/
├── src/
│ ├── ingest.py # Admin script — index PDFs into ChromaDB
│ ├── rag.py # RAG engine — retrieval + generation
│ └── app.py # Streamlit chat interface
├── docs/ # Drop your PDFs here
├── data/
│ └── chroma/ # ChromaDB vector store (auto-generated)
├── .env # Your API keys (never commit this!)
├── .env.example # Template for environment variables
├── .gitignore
├── requirements.txt
├── Dockerfile
└── README.md
When you add new documents or update existing ones:
# 1. Add/replace PDFs in /docs
cp new-document.pdf docs/
# 2. Re-run ingestion (old index is automatically replaced)
python src/ingest.py
# 3. Restart the app
streamlit run src/app.py- Go to huggingface.co/new-space
- SDK: Docker → Blank
- Visibility: Public
In your Space settings → Variables and secrets:
GROQ_API_KEY→ your Groq key
git remote add hf https://huggingface.co/spaces/salmazen/rag-knowledge-base
git push hf main
⚠️ Important: the ChromaDB index must be pre-built and committed to the repo before deploying. Runpython src/ingest.pylocally and commit thedata/chroma/folder.
- Multi-language support
- Admin dashboard to manage documents
- Automatic RAGAS evaluation
- Agents layer on top of the knowledge base
- Authentication for the chat interface
MIT — free to use, modify and distribute.