A full-stack Retrieval-Augmented Generation (RAG) application that allows users to upload documents and ask questions about them using AI. The system combines semantic search with a large language model to provide accurate, context-aware responses streamed in real-time.
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ Frontend │────▶│ Backend API │────▶│ Redis │
│ (Next.js) │ │ (Laravel) │ │ (Cache + │
│ Port 3000 │ │ Port 8000 │ │ History) │
└──────┬───────┘ └──────────────┘ └──────────────┘
│
│ Streaming Chat
▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ AI Backend │────▶│ Pinecone │ │ Ollama │
│ (FastAPI) │ │ (Vector DB) │ │ (Embeddings) │
│ Port 8081 │ └──────────────┘ │ Port 11434 │
└──────┬───────┘ └──────────────┘
│
▼
┌──────────────┐
│ Google AI │
│ (Gemini) │
└──────────────┘
- Streaming responses — real-time token-by-token output like ChatGPT
- Markdown rendering — bold, tables, code blocks, lists, headings rendered in chat
- Semantic cache — identical/similar questions are answered instantly from Redis cache
- Conversation history — stored per session in Redis with 24h TTL
- Re-ranking — retrieves 10 documents, re-ranks to top 4 using FlashRank for accuracy
- Session management — create, rename, and delete chat conversations
- Upload documents — supports PDF, DOCX, TXT, and Markdown files
- Automatic chunking — documents are split with
RecursiveCharacterTextSplitter - Vector embeddings — chunks are embedded using Ollama (
mxbai-embed-large) and stored in Pinecone - Delete documents — remove documents and their vectors from Pinecone
- Total tokens used, total chats, active users
- Cache hit rate monitoring
- Hourly token usage trend (last 24 hours)
- Top FAQ topics
- Recent chat logs
- Email/password login via Laravel Sanctum
- Google OAuth login via Laravel Socialite
- Protected routes with middleware
- Dark/light mode toggle
- Responsive sidebar with session management
- Smooth streaming with RAF-based rendering (no jitter)
- Loading states and micro-animations
| Layer | Technology |
|---|---|
| Frontend | Next.js 16, React 19, TypeScript, Tailwind CSS v4, Zustand, Framer Motion |
| Backend API | Laravel 12, PHP 8.2, Sanctum, Socialite |
| AI Backend | FastAPI, LangChain, Python 3.10 |
| LLM | Google Gemini (via langchain-google-genai) |
| Embeddings | Ollama (mxbai-embed-large) |
| Vector DB | Pinecone |
| Cache & History | Redis (Redis Stack with vector search) |
| Re-ranking | FlashRank |
| Containerization | Docker & Docker Compose |
Before you begin, make sure you have the following installed:
- Node.js ≥ 20.x
- PHP ≥ 8.2 + Composer
- Python ≥ 3.10
- Docker & Docker Compose
- Ollama running locally with
mxbai-embed-largemodel pulled - Pinecone account
- Google AI API Key (for Gemini)
git clone https://github.com/SatyaFebi/NEW_RAG.git
cd NEW_RAG# Install Ollama: https://ollama.com/download
ollama pull mxbai-embed-large
ollama serve # Runs on port 11434# Copy environment file
cp backend-fastapi/.env.example backend-fastapi/.envEdit backend-fastapi/.env with your credentials:
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX_NAME=your_index_name
EMBEDDING_MODEL_NAME=mxbai-embed-large
GOOGLE_API_KEY=your_google_ai_api_key
GEMINI_MODEL=gemini-2.5-flash-lite
REDIS_URL=redis://redis:6379
# Optional: LangSmith tracing
LANGSMITH_TRACING=false
LANGSMITH_API_KEY=
LANGSMITH_PROJECT=docker compose up -d --buildThis starts:
- FastAPI AI backend on
http://localhost:8081 - Redis Stack on
http://localhost:6379
cd backend
cp .env.example .env
composer install
php artisan key:generate
php artisan migrate
php artisan serve # Runs on port 8000cd frontend
npm installCreate/edit frontend/.env:
NEXT_PUBLIC_API_URL=http://localhost:8000/api
NEXT_PUBLIC_AI_API_URL=http://localhost:8081npm run dev # Runs on port 3000Navigate to http://localhost:3000 — login and start chatting!
cd frontend
npm run build
npm start # Serves production build on port 3000Uncomment the laravel_app and nextjs_web services in docker-compose.yml, then:
docker compose up -d --buildNEW_RAG/
├── backend/ # Laravel API (Auth, User Management)
│ ├── app/
│ ├── routes/api.php # API routes (login, user, OAuth)
│ ├── .env.example
│ └── ...
│
├── backend-fastapi/ # FastAPI AI Service
│ ├── main.py # All AI logic (chat, upload, dashboard)
│ ├── requirements.txt # Python dependencies
│ ├── Dockerfile
│ └── .env.example
│
├── frontend/ # Next.js Frontend
│ ├── src/
│ │ ├── app/
│ │ │ ├── chat/ # Chat page with streaming
│ │ │ ├── dashboard/ # Analytics dashboard
│ │ │ ├── documents/ # Document management
│ │ │ ├── login/ # Login page
│ │ │ └── globals.css # Design system + markdown styles
│ │ ├── components/ # Sidebar, ThemeProvider
│ │ ├── store/ # Zustand stores (chat, auth, theme)
│ │ └── lib/ # API utilities
│ └── .env
│
├── docker-compose.yml # Orchestrates FastAPI + Redis
└── README.md
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Health check |
POST |
/chat |
Send message, receive streaming AI response |
POST |
/documents/upload |
Upload document (PDF, DOCX, TXT, MD) |
GET |
/documents |
List documents (limited) |
DELETE |
/documents/{doc_id} |
Delete document by ID |
GET |
/dashboard/stats |
Get analytics data |
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/login |
Email/password login |
GET |
/api/user |
Get authenticated user (Sanctum) |
GET |
/api/auth/google |
Redirect to Google OAuth |
GET |
/api/auth/google/callback |
Handle Google OAuth callback |
| Problem | Solution |
|---|---|
| Ollama connection refused | Make sure ollama serve is running on port 11434 |
| Pinecone timeout | Check your API key and index name in .env |
| Redis connection error | Ensure Redis container is running: docker ps |
| Frontend not updating | Run npm run dev or npm run build |
| FastAPI container error | Check logs: docker logs fastapi_rag |
| Vite manifest error | Run npm run build in /frontend |
This project is for educational and personal use.
Built with ❤️ by Satya