Enterprise-grade Retrieval-Augmented Generation (RAG) system for intelligent defect analysis in Friction Stir Welding (FSW) processes.
- Hybrid Retrieval: BM25 + vector search with cross-encoder reranking
- Streaming Responses: Real-time LLM output with HuggingFace API
- Conversation Memory: Context-aware multi-turn conversations (last 10 turns)
- Memory Counter: Visual indicator showing remaining context window
- Sensor Data Integration: Real-time FSW sensor analysis with defect correlation
- Source Attribution: Full citation tracking with relevance scores
- Observability: Langfuse integration for tracing and monitoring
- Modular Architecture: Clean separation of concerns with 100% test coverage
- Production Ready: Error handling, validation, and graceful degradation
- Privacy-First: In-memory sessions, no persistent user data storage
# Clone repository
git clone https://github.com/SkullKrak7/RAG_Demo.git
cd RAG_Demo
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt
# Setup git hooks
./setup-hooks.sh# Copy environment template
cp .env.example .env
# Edit .env with your credentials
# Required: HF_TOKEN
# Optional: LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY# Place FSW PDF documents in data/ directory
python build_vectorstore.py --pdf-dir ./data --output-dir ./vectorstorestreamlit run app.pyAccess at http://localhost:8501
┌─────────────────┐
│ Streamlit UI │
└────────┬────────┘
│
┌────────▼────────┐
│ RAG Pipeline │
└────────┬────────┘
│
┌────┴────┐
│ │
┌───▼──┐ ┌──▼────┐
│Hybrid│ │Reranker│
│Retriever│ └───┬───┘
└───┬──┘ │
│ │
┌───▼─────────▼───┐
│ LLM Generator │
└──────────────────┘
rag_demo/
├── core/ # Configuration, models, exceptions
├── ingestion/ # Document loading and vectorstore building
├── retrieval/ # Hybrid retrieval and reranking
├── generation/ # LLM generation and response formatting
├── pipeline/ # End-to-end RAG orchestration
└── observability/ # Langfuse tracing integration
from rag_demo.core.config import RAGConfig
from rag_demo.ingestion.builder import VectorStoreBuilder
from rag_demo.retrieval.retriever import HybridRetriever
from rag_demo.pipeline.pipeline import RAGPipeline
# Initialize
config = RAGConfig()
builder = VectorStoreBuilder(config)
vectorstore = builder.load_vectorstore()
# Create retriever
retriever = HybridRetriever(vectorstore, documents, config)
# Create pipeline
pipeline = RAGPipeline(retriever, config)
# Query
response = pipeline.query("What causes wormhole defects in FSW?")
print(response.answer)
for source in response.sources:
print(f"- {source.doc_name} (Page {source.page_num})")for chunk in pipeline.stream_query("Explain FSW process parameters"):
print(chunk, end="", flush=True)from rag_demo.observability.tracer import RAGTracer
config = RAGConfig(langfuse_enabled=True)
tracer = RAGTracer(config)
pipeline = RAGPipeline(retriever, config, tracer=tracer)
response = pipeline.query("What are common FSW defects?")
# Feedback
tracer.score_feedback(1.0, "user_feedback")
tracer.flush()| Variable | Description | Default |
|---|---|---|
HF_TOKEN |
HuggingFace API token | Required |
MODEL_NAME |
LLM model identifier | meta-llama/Llama-3.1-8B-Instruct |
TEMPERATURE |
LLM temperature | 0.05 |
RETRIEVAL_K |
Documents to retrieve | 5 |
RERANK_TOP_K |
Documents after reranking | 3 |
CHUNK_SIZE |
Document chunk size | 500 |
CHUNK_OVERLAP |
Chunk overlap | 50 |
LANGFUSE_ENABLED |
Enable tracing | false |
See .env.example for complete configuration options.
# All tests
pytest tests/ -v --cov=rag_demo
# Specific module
pytest tests/unit/test_pipeline.py -v
# With coverage report
pytest tests/ --cov=rag_demo --cov-report=html# Format code
black rag_demo/ tests/
# Lint
pylint rag_demo/
# Type checking
mypy rag_demo/- Modular RAG architecture
- Hybrid retrieval (BM25 + vector)
- Cross-encoder reranking
- Streaming LLM responses
- Source attribution and citations
- Langfuse observability
- Vector store builder
- Streamlit UI
- 100% test coverage on core modules
- CI/CD pipeline (GitHub Actions)
- Integration tests
- RAGAS evaluation framework
- Performance monitoring
- API documentation
- Docker deployment
- LangChain: 1.2.8
- Streamlit: 1.53.1
- ChromaDB: 1.4.1
- Sentence Transformers: 5.2.2
- Langfuse: 3.12.1
- PyPDF: 6.6.2
- Embeddings:
sentence-transformers/paraphrase-MiniLM-L3-v2(384 dims) - LLM:
meta-llama/Llama-3.1-8B-Instruct(8B params) - Reranker:
cross-encoder/ms-marco-MiniLM-L-6-v2
- Query Latency: < 2s (with reranking)
- Embedding Speed: ~50ms per query
- Vector Store Size: ~500MB (5 documents)
MIT License - Copyright (c) 2026 Sai Karthik Kagolanu
- Author: Sai Karthik Kagolanu
- GitHub: @SkullKrak7
- Project: RAG_Demo