A powerful Retrieval-Augmented Generation (RAG) API that lets you upload documents, store them in a vector database, and chat with them using AI.
- Upload documents (PDF, DOCX, TXT, MD, CSV, JSON, HTML)
- Automatically chunks the text into smaller pieces
- Converts chunks to embeddings (vector representations) using OpenRouter
- Stores embeddings in Pinecone vector database
- Chat with your documents - ask questions and get AI-generated answers based on the content
Your Document β Text Extraction β Chunking β Embeddings β Pinecone
β
Your Question β Embedding β Search Similar Chunks β Send to LLM β Answer
Example:
- You upload a 50-page PDF about climate change or any other document you have
- You ask: "What are the main causes of global warming? or any other question related to the document you uploaded"
- The system finds the most relevant chunks from your PDF
- An AI reads those chunks and generates a concise answer
- Node.js (v18 or higher)
- Pinecone account - Sign up free
- OpenRouter account - Sign up free
- For HTTPS:
git clone https://github.com/daviddozie/rag-index.git
- For SSH
git clone git@github.com:daviddozie/rag-index.git
cd pinecone-jsnpm install dotenv express multer cors @pinecone-database/pinecone @openrouter/sdk uuid pdf-parse mammothCreate a .env file in the project root:
# Required
PINECONE_API_KEY=your_pinecone_api_key_here
OPENROUTER_API_KEY=your_openrouter_api_key_here
PINECONE_INDEX_NAME=rag-index
LLM_MODEL=google/gemini-2.0-flash-exp:free / (any of your favourite LLM)
RAG_DATA_DIR=./data
RAG_CHUNK_SIZE=500
RAG_CHUNK_OVERLAP=100
PORT=3000Getting API Keys:
Pinecone:
- Go to https://www.pinecone.io/
- Sign up and create a new project
- Copy your API key from the dashboard
OpenRouter:
- Go to https://openrouter.ai/
- Sign up and go to Settings β Keys
- Create a new API key
- Add credits at https://openrouter.ai/settings/credits (or use free models)
node pinecone.jsYou should see:
RAG API running on http://localhost:3000
Using LLM: it shows the LLM your using (e.g: google/gemini-2.0-flash-exp:free)
Using embeddings: openai/text-embedding-3-small
Endpoint: POST /upload
Upload a single file:
curl -X POST http://localhost:3000/upload \
-F "files=@document.pdf"Upload multiple files:
curl -X POST http://localhost:3000/upload \
-F "files=@document1.pdf" \
-F "files=@document2.txt"Response:
{
"context": "ctx-a1b2c3d4",
"chunks": 15,
"files": 1
}Important: Save the context ID - you'll need it to chat with these documents!
Endpoint: POST /chat
curl -X POST http://localhost:3000/chat \
-H "Content-Type: application/json" \
-d '{
"context": "ctx-a1b2c3d4",
"query": "What is the main topic of this document?"
}'Response:
{
"answer": "The main topic of this document is...",
"context": [
"Relevant chunk 1 from your document...",
"Relevant chunk 2 from your document..."
]
}Endpoint: GET /contexts
curl http://localhost:3000/contextsResponse:
[
"ctx-a1b2c3d4",
"ctx-e5f6g7h8",
"ctx-i9j0k1l2"
]Endpoint: GET /context/:name/metadata
curl http://localhost:3000/context/ctx-a1b2c3d4/metadataResponse:
[
{
"id": "chunk-uuid-1",
"context": "ctx-a1b2c3d4",
"filename": "document.pdf",
"offset_start": 0,
"offset_end": 500,
"text": "First 500 characters of your document..."
},
...
]- β
PDF (
.pdf) - Uses pdf-parse - β
Word Documents (
.docx) - Uses mammoth - β
Text Files (
.txt) - β
Markdown (
.md) - β
CSV (
.csv) - β
JSON (
.json) - β
HTML (
.html,.htm)
Retrieval-Augmented Generation is a technique that combines:
- Information Retrieval - Finding relevant documents
- Language Generation - Using AI to generate answers
This solves the problem of AI hallucination by grounding responses in your actual documents.
1. Embeddings
- Text converted to numbers (vectors)
- Similar text has similar vectors
- Example: "cat" and "kitten" have similar embeddings
2. Vector Database (Pinecone)
- Stores embeddings efficiently
- Searches by similarity, not keywords
- Example: Search for "climate change" finds "global warming"
3. Chunking
- Breaking documents into smaller pieces
- Default: 500 characters with 100 character overlap
- Overlap ensures context isn't lost at chunk boundaries
4. Semantic Search
- Searches by meaning, not exact words
- Example: "How do I reset my password?" matches "Password recovery instructions"
pinecone-js/
βββ pinecone.js # Main API server
βββ .env # API keys (DO NOT COMMIT)
βββ package.json # Dependencies
βββ data/ # Local storage (created automatically)
β βββ ctx-xxxxxxxx/ # Each context has its own folder
β βββ files/ # Original uploaded files
β βββ metadata.json # Chunk metadata
βββ README.md # This file
Control how documents are split:
RAG_CHUNK_SIZE=500 # Characters per chunk
RAG_CHUNK_OVERLAP=100 # Characters that overlap between chunksWhen to adjust:
- Larger chunks (1000+): Better for long, coherent passages
- Smaller chunks (300-500): Better for precise answers
Choose different AI models from OpenRouter:
# Free options
LLM_MODEL=google/gemini-2.0-flash-exp:free
LLM_MODEL=meta-llama/llama-3.2-3b-instruct:free
# Paid options (better quality)
LLM_MODEL=anthropic/claude-3.5-sonnet
LLM_MODEL=openai/gpt-4oProblem: OpenRouter API returns 402 error
Solutions:
- Use a free model:
LLM_MODEL=google/gemini-2.0-flash-exp:free - Add credits at https://openrouter.ai/settings/credits
Problem: Large files take time to process
Why: Each chunk needs to be converted to embeddings via API
Expected times:
- Small file (1-5 pages): 5-10 seconds
- Medium file (10-50 pages): 30-60 seconds
- Large file (100+ pages): 2-5 minutes
Problem: Context ID doesn't exist locally
Solution: Upload a new document and use the returned context ID
Problem: Chat returns empty context array
Possible causes:
- Wrong context ID
- Query is too different from document content
- Document didn't upload correctly
Fix: Check metadata endpoint to verify chunks exist
- Research Assistant - Upload papers and ask questions
- Document Q&A - Chat with user manuals, contracts, reports
- Knowledge Base - Create a searchable company wiki
- Study Helper - Upload textbooks and get explanations
- Legal Document Analysis - Query contracts and agreements
- Never commit
.envto version control - Add
.envto.gitignore - Keep API keys secret
- The
./datafolder contains uploaded files - protect it
RAG Concepts:
Technologies Used:
Feel free to:
- Report bugs
- Suggest features
- Submit pull requests
- Improve documentation
MIT License - feel free to use this project for learning and commercial purposes.
Questions? Open an issue or check the troubleshooting section above!
Happy RAG building! π