Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Endee Configuration
ENDEE_API_KEY=your_endee_api_key_here
ENDEE_URL=https://api.endee.io
ENDEE_INDEX_NAME=placement_copilot
ENDEE_MODE=local # "local" for mock (default), "production" for real Endee

# Embedding Model
MODEL_NAME=all-MiniLM-L6-v2

# Server Configuration
HOST=0.0.0.0
PORT=8000
DEBUG=True
54 changes: 28 additions & 26 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,32 +1,34 @@

# Ignore build directory
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
venv/
env/
ENV/
*.egg-info/
dist/
build/
# Ignore build like directory
build*/
# Ignore all files in the build directory
build/*

# Ignore tests build directory
tests/build/
tests/build*/

# Test binaries
tests/**/ndd_filter_test

# macOS debug symbols
*.dSYM/

# Sometimes data files are created for tetsing
data/
data/*
# Environment
.env
.env.local

# VS Code directory
# IDE
.vscode/
.vscode/*
.idea/
*.swp
*.swo
*~

# Frontend
frontend/
frontend/*

# DS Store
# OS
.DS_Store
Thumbs.db

# Data
*.json.backup

# Logs
*.log
.venv/
285 changes: 167 additions & 118 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,139 +1,188 @@
<p align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="docs/assets/logo-dark.svg">
<source media="(prefers-color-scheme: light)" srcset="docs/assets/logo-light.svg">
<img height="100" alt="Endee" src="docs/assets/logo-dark.svg">
</picture>
</p>

<p align="center">
<b>High-performance open-source vector database for AI search, RAG, semantic search, and hybrid retrieval.</b>
</p>

<p align="center">
<a href="./docs/getting-started.md"><img src="https://img.shields.io/badge/Quick_Start-Local_Setup-success?style=flat-square" alt="Quick Start"></a>
<a href="https://docs.endee.io/quick-start"><img src="https://img.shields.io/badge/Docs-Quick_Start-success?style=flat-square" alt="Docs"></a>
<a href="https://github.com/endee-io/endee/blob/master/LICENSE"><img src="https://img.shields.io/github/license/endee-io/endee?style=flat-square" alt="License"></a>
<a href="https://discord.gg/5HFGqDZQE3"><img src="https://img.shields.io/badge/Discord-Join_Chat-5865F2?logo=discord&style=flat-square" alt="Discord"></a>
<a href="https://endee.io/"><img src="https://img.shields.io/badge/Website-Endee-111111?style=flat-square" alt="Website"></a>
<!-- <a href="https://endee.io/benchmarks"><img src="https://img.shields.io/badge/Benchmarks-Coming_Soon-1F8B4C?style=flat-square" alt="Benchmarks"></a> -->
<!-- <a href="https://endee.io/cloud"><img src="https://img.shields.io/badge/Cloud-Coming_Soon-2496ED?style=flat-square" alt="Cloud"></a> -->
</p>

<p align="center">
<strong><a href="./docs/getting-started.md">Quick Start</a> • <a href="#why-endee">Why Endee</a> • <a href="#use-cases">Use Cases</a> • <a href="#features">Features</a> • <a href="#api-and-clients">API and Clients</a> • <a href="#docs-and-links">Docs</a> • <a href="#community-and-contact">Contact</a></strong>
</p>

# Endee: Open-Source Vector Database for AI Search

**Endee** is a high-performance open-source vector database built for AI search and retrieval workloads. It is designed for teams building **RAG pipelines**, **semantic search**, **hybrid search**, recommendation systems, and filtered vector retrieval APIs that need production-oriented performance and control.

Endee combines vector search with filtering, sparse retrieval support, backup workflows, and deployment flexibility across local builds and Docker-based environments. The project is implemented in C++ and optimized for modern CPU targets, including AVX2, AVX512, NEON, and SVE2.

If you want the fastest path to evaluate Endee locally, start with the [Getting Started guide](./docs/getting-started.md) or the hosted docs at [docs.endee.io](https://docs.endee.io/quick-start).

## Why Endee

- Built as a dedicated vector database for AI applications, search systems, and retrieval-heavy workloads.
- Supports dense vector retrieval plus sparse search capabilities for hybrid search use cases.
- Includes payload filtering for metadata-aware retrieval and application-specific query logic.
- Ships with operational features already documented in this repo, including backup flows and runtime observability.
- Offers flexible deployment paths: local scripts, manual builds, Docker images, and prebuilt registry images.
# 🎓 Campus Placement Copilot

> AI-powered RAG system to help engineering students prepare for campus placements using semantic search and LLM-generated answers.

![RAG Pipeline](https://img.shields.io/badge/RAG-Pipeline-blue) ![Endee](https://img.shields.io/badge/Vector_DB-Endee-green) ![Groq](https://img.shields.io/badge/LLM-Groq_LLaMA3-orange) ![FastAPI](https://img.shields.io/badge/Backend-FastAPI-teal)

---

## 🚀 What is this?

Campus Placement Copilot is a **Retrieval-Augmented Generation (RAG)** application that helps students get accurate, context-aware answers about campus placement preparation — covering companies like TCS, Infosys, Wipro, Cognizant, and more.

Students can ask questions like:
- *"How to crack TCS Ninja interview?"*
- *"What is the Infosys System Engineer selection process?"*
- *"Tips for Wipro NLTH exam?"*

The system retrieves the most relevant information from a vector database and generates a precise answer using an LLM.

---

## 🏗️ System Design
Student Query
Frontend (HTML/CSS/JS)
│ POST /ask
FastAPI Backend (port 8000)
SentenceTransformer
(all-MiniLM-L6-v2)
generates query embedding
Endee Vector DB (HNSW Index)
searches top-K similar chunks
Retrieved Context Chunks
Groq LLaMA 3.1 (LLM)
generates final answer
Answer + Sources → Student

---

## 🛠️ Tech Stack

| Component | Technology |
|---|---|
| Vector Database | **Endee** (Docker, HNSW index) |
| Embeddings | SentenceTransformers `all-MiniLM-L6-v2` |
| LLM | Groq `llama-3.1-8b-instant` |
| Backend | FastAPI + Python |
| Frontend | Vanilla HTML, CSS, JavaScript |
| Deployment | Docker + Uvicorn |

---

## 🔍 How Endee is Used

[Endee](https://github.com/endee-io/endee) is a high-performance open-source vector database built for speed and efficiency.

In this project, Endee is used to:
1. **Store** document embeddings as 384-dimensional vectors using HNSW index
2. **Search** semantically similar chunks using cosine similarity
3. **Retrieve** top-K relevant context for RAG pipeline

```python
# Create index
client.create_index(
name="placement_copilot",
dimension=384,
space_type="cosine",
precision=Precision.INT8
)

# Upsert vectors
index.upsert([{
"id": "doc_0_chunk_0",
"vector": embedding,
"meta": {"text": chunk, "company": "TCS"}
}])

# Semantic search
results = index.query(
vector=query_embedding,
top_k=5,
ef=128
)
```

## Getting Started
---

The full installation, build, Docker, runtime, and authentication instructions are in [docs/getting-started.md](./docs/getting-started.md).
## ⚙️ Setup Instructions

Fastest local path:
### Prerequisites
- Python 3.8+
- Docker Desktop
- Groq API key (free at [console.groq.com](https://console.groq.com))

### Step 1 — Clone the repo
```bash
chmod +x ./install.sh ./run.sh
./install.sh --release --avx2
./run.sh
git clone https://github.com/YOUR_USERNAME/YOUR_REPO.git
cd campus-placement-copilot
```

The server listens on port `8080`. For detailed setup paths, supported operating systems, CPU optimization flags, Docker usage, and authentication examples, use:

- [Getting Started](./docs/getting-started.md)
- [Hosted Quick Start Docs](https://docs.endee.io/quick-start)

## Use Cases

### RAG and AI Retrieval

Use Endee as the retrieval layer for question answering, chat assistants, copilots, and other RAG applications that need fast vector search with metadata-aware filtering.

### Agentic AI and AI Agent Memory

Use Endee as the long-term memory and context retrieval layer for AI agents built with frameworks like LangChain, CrewAI, AutoGen, and LlamaIndex. Store and retrieve past observations, tool outputs, conversation history, and domain knowledge mid-execution with low-latency filtered vector search, so your autonomous agents get the right context without stalling their reasoning loop.

### Semantic Search

Build semantic search experiences for documents, products, support content, and knowledge bases using vector similarity search instead of exact keyword-only matching.

### Hybrid Search

Combine dense retrieval, sparse vectors, and filtering to improve relevance for search workflows where both semantic understanding and term-level precision matter.

### Recommendations and Matching

Support recommendation, similarity matching, and nearest-neighbor retrieval workflows across text, embeddings, and other high-dimensional representations.

## Features

- **Vector search** for AI retrieval and semantic similarity workloads.
- **Hybrid retrieval support** with sparse vector capabilities documented in [docs/sparse.md](./docs/sparse.md).
- **Payload filtering** for structured retrieval logic documented in [docs/filter.md](./docs/filter.md).
- **Backup APIs and flows** documented in [docs/backup-system.md](./docs/backup-system.md).
- **Operational logging and instrumentation** documented in [docs/logs.md](./docs/logs.md) and [docs/mdbx-instrumentation.md](./docs/mdbx-instrumentation.md).
- **CPU-targeted builds** for AVX2, AVX512, NEON, and SVE2 deployments.
- **Docker deployment options** for local and server environments.

## API and Clients

Endee exposes an HTTP API for managing indexes and serving retrieval workloads. The current repo documentation and examples focus on running the server directly and calling its API endpoints.

Current developer entry points:

- [Getting Started](./docs/getting-started.md) for local build and run flows
- [Hosted Docs](https://docs.endee.io/quick-start) for product documentation
- [Release Notes 1.0.0](https://github.com/endee-io/endee/releases/tag/1.0.0) for recent platform changes

## Docs and Links
### Step 2 — Start Endee Vector DB
```bash
docker run -d \
-p 8080:8080 \
-v ./endee-data:/data \
--name endee-server \
endeeio/endee-server:latest
```

- [Getting Started](./docs/getting-started.md)
- [Hosted Documentation](https://docs.endee.io/quick-start)
- [Release Notes](https://github.com/endee-io/endee/releases/tag/1.0.0)
- [Sparse Search](./docs/sparse.md)
- [Filtering](./docs/filter.md)
- [Backups](./docs/backup-system.md)
### Step 3 — Install dependencies
```bash
pip install -r requirements.txt
```

## Community and Contact
### Step 4 — Configure environment
Create `.env` file:
```env
GROQ_API_KEY=your_groq_api_key_here
GROQ_MODEL=llama-3.1-8b-instant
MODEL_NAME=all-MiniLM-L6-v2
```

- Join the community on [Discord](https://discord.gg/5HFGqDZQE3)
- Visit the website at [endee.io](https://endee.io/)
- For trademark or branding permissions, contact [enterprise@endee.io](mailto:enterprise@endee.io)
### Step 5 — Ingest data into Endee
```bash
python -c "import asyncio; from backend.ingest import ingest_data; asyncio.run(ingest_data())"
```

## Contributing
### Step 6 — Start backend
```bash
python app.py
```

We welcome contributions from the community to help make vector search faster and more accessible for everyone.
### Step 7 — Start frontend
```bash
cd frontend
python -m http.server 3000
```

- Submit pull requests for fixes, features, and improvements
- Report bugs or performance issues through GitHub issues
- Propose enhancements for search quality, performance, and deployment workflows
### Step 8 — Open browser
http://localhost:3000

## License
---

Endee is open source software licensed under the **Apache License 2.0**. See the [LICENSE](./LICENSE) file for full terms.
## 📁 Project Structure
campus-placement-copilot/
├── app.py # FastAPI entry point
├── backend/
│ ├── ingest.py # Data ingestion + Endee upsert
│ └── search.py # Semantic search + Groq LLM
├── frontend/
│ ├── index.html # UI
│ ├── script.js # API calls
│ └── style.css # Styling
├── data/
│ └── placement_data.json # Placement knowledge base
├── requirements.txt
└── README.md

## Trademark and Branding
---

“Endee” and the Endee logo are trademarks of Endee Labs.
## 💡 Features

The Apache License 2.0 does not grant permission to use the Endee name, logos, or branding in a way that suggests endorsement or affiliation.
- ✅ Semantic search powered by Endee HNSW vector index
- ✅ RAG pipeline — retrieve then generate
- ✅ Groq LLaMA 3.1 for fast, accurate answers
- ✅ Source citations with every answer
- ✅ Coverage: TCS, Infosys, Wipro, Cognizant, and more

If you offer a hosted or managed service based on this software, you must use your own branding and avoid implying it is an official Endee service.
---

## Third-Party Software
## 🔗 References

This project includes or depends on third-party software components licensed under their respective open-source licenses. Use of those components is governed by their own license terms.
- [Endee Vector DB](https://github.com/endee-io/endee)
- [Endee Documentation](https://docs.endee.io)
- [Groq API](https://console.groq.com)
- [SentenceTransformers](https://www.sbert.net)
Loading