endee-io · cherangovindharaj · Apr 12, 2026
diff --git a/.env.example b/.env.example
@@ -0,0 +1,13 @@
+# Endee Configuration
+ENDEE_API_KEY=your_endee_api_key_here
+ENDEE_URL=https://api.endee.io
+ENDEE_INDEX_NAME=placement_copilot
+ENDEE_MODE=local  # "local" for mock (default), "production" for real Endee
+
+# Embedding Model
+MODEL_NAME=all-MiniLM-L6-v2
+
+# Server Configuration
+HOST=0.0.0.0
+PORT=8000
+DEBUG=True
diff --git a/.gitignore b/.gitignore
@@ -1,32 +1,34 @@
-
-# Ignore build directory
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+venv/
+env/
+ENV/
+*.egg-info/
+dist/
 build/
-# Ignore build like directory
-build*/
-# Ignore all files in the build directory
-build/*
-
-# Ignore tests build directory
-tests/build/
-tests/build*/
-
-# Test binaries
-tests/**/ndd_filter_test
-
-# macOS debug symbols
-*.dSYM/
 
-# Sometimes data files are created for tetsing
-data/
-data/*
+# Environment
+.env
+.env.local
 
-# VS Code directory
+# IDE
 .vscode/
-.vscode/*
+.idea/
+*.swp
+*.swo
+*~
 
-# Frontend
-frontend/
-frontend/*
-
-# DS Store
+# OS
 .DS_Store
+Thumbs.db
+
+# Data
+*.json.backup
+
+# Logs
+*.log
+.venv/ 
diff --git a/README.md b/README.md
@@ -1,139 +1,188 @@
-<p align="center">
-  <picture>
-      <source media="(prefers-color-scheme: dark)" srcset="docs/assets/logo-dark.svg">
-      <source media="(prefers-color-scheme: light)" srcset="docs/assets/logo-light.svg">
-      <img height="100" alt="Endee" src="docs/assets/logo-dark.svg">
-  </picture>
-</p>
-
-<p align="center">
-    <b>High-performance open-source vector database for AI search, RAG, semantic search, and hybrid retrieval.</b>
-</p>
-
-<p align="center">
-    <a href="./docs/getting-started.md"><img src="https://img.shields.io/badge/Quick_Start-Local_Setup-success?style=flat-square" alt="Quick Start"></a>
-    <a href="https://docs.endee.io/quick-start"><img src="https://img.shields.io/badge/Docs-Quick_Start-success?style=flat-square" alt="Docs"></a>
-    <a href="https://github.com/endee-io/endee/blob/master/LICENSE"><img src="https://img.shields.io/github/license/endee-io/endee?style=flat-square" alt="License"></a>
-    <a href="https://discord.gg/5HFGqDZQE3"><img src="https://img.shields.io/badge/Discord-Join_Chat-5865F2?logo=discord&style=flat-square" alt="Discord"></a>
-    <a href="https://endee.io/"><img src="https://img.shields.io/badge/Website-Endee-111111?style=flat-square" alt="Website"></a>
-    <!-- <a href="https://endee.io/benchmarks"><img src="https://img.shields.io/badge/Benchmarks-Coming_Soon-1F8B4C?style=flat-square" alt="Benchmarks"></a> -->
-    <!-- <a href="https://endee.io/cloud"><img src="https://img.shields.io/badge/Cloud-Coming_Soon-2496ED?style=flat-square" alt="Cloud"></a> -->
-</p>
-
-<p align="center">
-<strong><a href="./docs/getting-started.md">Quick Start</a> • <a href="#why-endee">Why Endee</a> • <a href="#use-cases">Use Cases</a> • <a href="#features">Features</a> • <a href="#api-and-clients">API and Clients</a> • <a href="#docs-and-links">Docs</a> • <a href="#community-and-contact">Contact</a></strong>
-</p>
-
-# Endee: Open-Source Vector Database for AI Search
-
-**Endee** is a high-performance open-source vector database built for AI search and retrieval workloads. It is designed for teams building **RAG pipelines**, **semantic search**, **hybrid search**, recommendation systems, and filtered vector retrieval APIs that need production-oriented performance and control.
-
-Endee combines vector search with filtering, sparse retrieval support, backup workflows, and deployment flexibility across local builds and Docker-based environments. The project is implemented in C++ and optimized for modern CPU targets, including AVX2, AVX512, NEON, and SVE2.
-
-If you want the fastest path to evaluate Endee locally, start with the [Getting Started guide](./docs/getting-started.md) or the hosted docs at [docs.endee.io](https://docs.endee.io/quick-start).
-
-## Why Endee
-
-- Built as a dedicated vector database for AI applications, search systems, and retrieval-heavy workloads.
-- Supports dense vector retrieval plus sparse search capabilities for hybrid search use cases.
-- Includes payload filtering for metadata-aware retrieval and application-specific query logic.
-- Ships with operational features already documented in this repo, including backup flows and runtime observability.
-- Offers flexible deployment paths: local scripts, manual builds, Docker images, and prebuilt registry images.
+# 🎓 Campus Placement Copilot
+
+> AI-powered RAG system to help engineering students prepare for campus placements using semantic search and LLM-generated answers.
+
+![RAG Pipeline](https://img.shields.io/badge/RAG-Pipeline-blue) ![Endee](https://img.shields.io/badge/Vector_DB-Endee-green) ![Groq](https://img.shields.io/badge/LLM-Groq_LLaMA3-orange) ![FastAPI](https://img.shields.io/badge/Backend-FastAPI-teal)
+
+---
+
+## 🚀 What is this?
+
+Campus Placement Copilot is a **Retrieval-Augmented Generation (RAG)** application that helps students get accurate, context-aware answers about campus placement preparation — covering companies like TCS, Infosys, Wipro, Cognizant, and more.
+
+Students can ask questions like:
+- *"How to crack TCS Ninja interview?"*
+- *"What is the Infosys System Engineer selection process?"*
+- *"Tips for Wipro NLTH exam?"*
+
+The system retrieves the most relevant information from a vector database and generates a precise answer using an LLM.
+
+---
+
+## 🏗️ System Design
+Student Query
+│
+▼
+Frontend (HTML/CSS/JS)
+│  POST /ask
+▼
+FastAPI Backend (port 8000)
+│
+▼
+SentenceTransformer
+(all-MiniLM-L6-v2)
+generates query embedding
+│
+▼
+Endee Vector DB (HNSW Index)
+searches top-K similar chunks
+│
+▼
+Retrieved Context Chunks
+│
+▼
+Groq LLaMA 3.1 (LLM)
+generates final answer
+│
+▼
+Answer + Sources → Student
+
+---
+
+## 🛠️ Tech Stack
+
+| Component | Technology |
+|---|---|
+| Vector Database | **Endee** (Docker, HNSW index) |
+| Embeddings | SentenceTransformers `all-MiniLM-L6-v2` |
+| LLM | Groq `llama-3.1-8b-instant` |
+| Backend | FastAPI + Python |
+| Frontend | Vanilla HTML, CSS, JavaScript |
+| Deployment | Docker + Uvicorn |
+
+---
+
+## 🔍 How Endee is Used
+
+[Endee](https://github.com/endee-io/endee) is a high-performance open-source vector database built for speed and efficiency.
+
+In this project, Endee is used to:
+1. **Store** document embeddings as 384-dimensional vectors using HNSW index
+2. **Search** semantically similar chunks using cosine similarity
+3. **Retrieve** top-K relevant context for RAG pipeline
+
+```python
+# Create index
+client.create_index(
+    name="placement_copilot",
+    dimension=384,
+    space_type="cosine",
+    precision=Precision.INT8
+)
+
+# Upsert vectors
+index.upsert([{
+    "id": "doc_0_chunk_0",
+    "vector": embedding,
+    "meta": {"text": chunk, "company": "TCS"}
+}])
+
+# Semantic search
+results = index.query(
+    vector=query_embedding,
+    top_k=5,
+    ef=128
+)
+```
 
-## Getting Started
+---
 
-The full installation, build, Docker, runtime, and authentication instructions are in [docs/getting-started.md](./docs/getting-started.md).
+## ⚙️ Setup Instructions
 
-Fastest local path:
+### Prerequisites
+- Python 3.8+
+- Docker Desktop
+- Groq API key (free at [console.groq.com](https://console.groq.com))
 
+### Step 1 — Clone the repo
 ```bash
-chmod +x ./install.sh ./run.sh
-./install.sh --release --avx2
-./run.sh
+git clone https://github.com/YOUR_USERNAME/YOUR_REPO.git
+cd campus-placement-copilot
 ```
 
-The server listens on port `8080`. For detailed setup paths, supported operating systems, CPU optimization flags, Docker usage, and authentication examples, use:
-
-- [Getting Started](./docs/getting-started.md)
-- [Hosted Quick Start Docs](https://docs.endee.io/quick-start)
-
-## Use Cases
-
-### RAG and AI Retrieval
-
-Use Endee as the retrieval layer for question answering, chat assistants, copilots, and other RAG applications that need fast vector search with metadata-aware filtering.
-
-### Agentic AI and AI Agent Memory
-
-Use Endee as the long-term memory and context retrieval layer for AI agents built with frameworks like LangChain, CrewAI, AutoGen, and LlamaIndex. Store and retrieve past observations, tool outputs, conversation history, and domain knowledge mid-execution with low-latency filtered vector search, so your autonomous agents get the right context without stalling their reasoning loop.
-
-### Semantic Search
-
-Build semantic search experiences for documents, products, support content, and knowledge bases using vector similarity search instead of exact keyword-only matching.
-
-### Hybrid Search
-
-Combine dense retrieval, sparse vectors, and filtering to improve relevance for search workflows where both semantic understanding and term-level precision matter.
-
-### Recommendations and Matching
-
-Support recommendation, similarity matching, and nearest-neighbor retrieval workflows across text, embeddings, and other high-dimensional representations.
-
-## Features
-
-- **Vector search** for AI retrieval and semantic similarity workloads.
-- **Hybrid retrieval support** with sparse vector capabilities documented in [docs/sparse.md](./docs/sparse.md).
-- **Payload filtering** for structured retrieval logic documented in [docs/filter.md](./docs/filter.md).
-- **Backup APIs and flows** documented in [docs/backup-system.md](./docs/backup-system.md).
-- **Operational logging and instrumentation** documented in [docs/logs.md](./docs/logs.md) and [docs/mdbx-instrumentation.md](./docs/mdbx-instrumentation.md).
-- **CPU-targeted builds** for AVX2, AVX512, NEON, and SVE2 deployments.
-- **Docker deployment options** for local and server environments.
-
-## API and Clients
-
-Endee exposes an HTTP API for managing indexes and serving retrieval workloads. The current repo documentation and examples focus on running the server directly and calling its API endpoints.
-
-Current developer entry points:
-
-- [Getting Started](./docs/getting-started.md) for local build and run flows
-- [Hosted Docs](https://docs.endee.io/quick-start) for product documentation
-- [Release Notes 1.0.0](https://github.com/endee-io/endee/releases/tag/1.0.0) for recent platform changes
-
-## Docs and Links
+### Step 2 — Start Endee Vector DB
+```bash
+docker run -d \
+  -p 8080:8080 \
+  -v ./endee-data:/data \
+  --name endee-server \
+  endeeio/endee-server:latest
+```
 
-- [Getting Started](./docs/getting-started.md)
-- [Hosted Documentation](https://docs.endee.io/quick-start)
-- [Release Notes](https://github.com/endee-io/endee/releases/tag/1.0.0)
-- [Sparse Search](./docs/sparse.md)
-- [Filtering](./docs/filter.md)
-- [Backups](./docs/backup-system.md)
+### Step 3 — Install dependencies
+```bash
+pip install -r requirements.txt
+```
 
-## Community and Contact
+### Step 4 — Configure environment
+Create `.env` file:
+```env
+GROQ_API_KEY=your_groq_api_key_here
+GROQ_MODEL=llama-3.1-8b-instant
+MODEL_NAME=all-MiniLM-L6-v2
+```
 
-- Join the community on [Discord](https://discord.gg/5HFGqDZQE3)
-- Visit the website at [endee.io](https://endee.io/)
-- For trademark or branding permissions, contact [enterprise@endee.io](mailto:enterprise@endee.io)
+### Step 5 — Ingest data into Endee
+```bash
+python -c "import asyncio; from backend.ingest import ingest_data; asyncio.run(ingest_data())"
+```
 
-## Contributing
+### Step 6 — Start backend
+```bash
+python app.py
+```
 
-We welcome contributions from the community to help make vector search faster and more accessible for everyone.
+### Step 7 — Start frontend
+```bash
+cd frontend
+python -m http.server 3000
+```
 
-- Submit pull requests for fixes, features, and improvements
-- Report bugs or performance issues through GitHub issues
-- Propose enhancements for search quality, performance, and deployment workflows
+### Step 8 — Open browser
+http://localhost:3000
 
-## License
+---
 
-Endee is open source software licensed under the **Apache License 2.0**. See the [LICENSE](./LICENSE) file for full terms.
+## 📁 Project Structure
+campus-placement-copilot/
+├── app.py                 # FastAPI entry point
+├── backend/
+│   ├── ingest.py          # Data ingestion + Endee upsert
+│   └── search.py          # Semantic search + Groq LLM
+├── frontend/
+│   ├── index.html         # UI
+│   ├── script.js          # API calls
+│   └── style.css          # Styling
+├── data/
+│   └── placement_data.json # Placement knowledge base
+├── requirements.txt
+└── README.md
 
-## Trademark and Branding
+---
 
-“Endee” and the Endee logo are trademarks of Endee Labs.
+## 💡 Features
 
-The Apache License 2.0 does not grant permission to use the Endee name, logos, or branding in a way that suggests endorsement or affiliation.
+- ✅ Semantic search powered by Endee HNSW vector index
+- ✅ RAG pipeline — retrieve then generate
+- ✅ Groq LLaMA 3.1 for fast, accurate answers
+- ✅ Source citations with every answer
+- ✅ Coverage: TCS, Infosys, Wipro, Cognizant, and more
 
-If you offer a hosted or managed service based on this software, you must use your own branding and avoid implying it is an official Endee service.
+---
 
-## Third-Party Software
+## 🔗 References
 
-This project includes or depends on third-party software components licensed under their respective open-source licenses. Use of those components is governed by their own license terms.
+- [Endee Vector DB](https://github.com/endee-io/endee)
+- [Endee Documentation](https://docs.endee.io)
+- [Groq API](https://console.groq.com)
+- [SentenceTransformers](https://www.sbert.net)