Skip to content

mohameddmansurr/kubernetes-rag-ops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Kubernetes RAG Pipeline (MLOps)

A microservices-based Retrieval Augmented Generation (RAG) pipeline deployed on Kubernetes. This project demonstrates how to productionize AI models using Docker, Kubernetes, and Vector Databases.

πŸ— Architecture

  • Ingestion Service: Python FastAPI + Sentence Transformers.
  • Vector Database: Qdrant (StatefulSet) for semantic search.
  • Orchestration: Kubernetes (Minikube).

πŸš€ Key Features

  • CPU-Optimized AI: Docker builds tuned for non-GPU inference.
  • Resilience: Implemented livenessProbes and resource limits.
  • Stateful Management: Persistent storage for vector data.

πŸ“¦ How to Run

  1. minikube start --memory=4096
  2. kubectl apply -f k8s/
  3. kubectl port-forward service/rag-embedding 7000:80

About

A production-ready MLOps pipeline for Retrieval Augmented Generation (RAG). Features a FastAPI inference service and Qdrant vector database, orchestrated on Kubernetes with StatefulSets, resource scaling, and self-healing capabilities.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors