A microservices-based Retrieval Augmented Generation (RAG) pipeline deployed on Kubernetes. This project demonstrates how to productionize AI models using Docker, Kubernetes, and Vector Databases.
- Ingestion Service: Python FastAPI + Sentence Transformers.
- Vector Database: Qdrant (StatefulSet) for semantic search.
- Orchestration: Kubernetes (Minikube).
- CPU-Optimized AI: Docker builds tuned for non-GPU inference.
- Resilience: Implemented livenessProbes and resource limits.
- Stateful Management: Persistent storage for vector data.
minikube start --memory=4096kubectl apply -f k8s/kubectl port-forward service/rag-embedding 7000:80