Skip to content

daviddozie/rag-index

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RAG API with Pinecone & OpenRouter

A powerful Retrieval-Augmented Generation (RAG) API that lets you upload documents, store them in a vector database, and chat with them using AI.

🎯 What This Does

  1. Upload documents (PDF, DOCX, TXT, MD, CSV, JSON, HTML)
  2. Automatically chunks the text into smaller pieces
  3. Converts chunks to embeddings (vector representations) using OpenRouter
  4. Stores embeddings in Pinecone vector database
  5. Chat with your documents - ask questions and get AI-generated answers based on the content

🧠 How It Works

Your Document β†’ Text Extraction β†’ Chunking β†’ Embeddings β†’ Pinecone
                                                              ↓
Your Question β†’ Embedding β†’ Search Similar Chunks β†’ Send to LLM β†’ Answer

Example:

  • You upload a 50-page PDF about climate change or any other document you have
  • You ask: "What are the main causes of global warming? or any other question related to the document you uploaded"
  • The system finds the most relevant chunks from your PDF
  • An AI reads those chunks and generates a concise answer

πŸ“‹ Prerequisites

πŸš€ Quick Start

1. Clone or Download This Project

-  For HTTPS:
git clone https://github.com/daviddozie/rag-index.git

-  For SSH
git clone git@github.com:daviddozie/rag-index.git

cd pinecone-js

2. Install Dependencies

npm install dotenv express multer cors @pinecone-database/pinecone @openrouter/sdk uuid pdf-parse mammoth

3. Set Up Environment Variables

Create a .env file in the project root:

# Required
PINECONE_API_KEY=your_pinecone_api_key_here
OPENROUTER_API_KEY=your_openrouter_api_key_here
PINECONE_INDEX_NAME=rag-index
LLM_MODEL=google/gemini-2.0-flash-exp:free / (any of your favourite LLM)
RAG_DATA_DIR=./data
RAG_CHUNK_SIZE=500
RAG_CHUNK_OVERLAP=100
PORT=3000

Getting API Keys:

Pinecone:

  1. Go to https://www.pinecone.io/
  2. Sign up and create a new project
  3. Copy your API key from the dashboard

OpenRouter:

  1. Go to https://openrouter.ai/
  2. Sign up and go to Settings β†’ Keys
  3. Create a new API key
  4. Add credits at https://openrouter.ai/settings/credits (or use free models)

4. Start the Server

node pinecone.js

You should see:

RAG API running on http://localhost:3000
Using LLM: it shows the LLM your using (e.g: google/gemini-2.0-flash-exp:free)
Using embeddings: openai/text-embedding-3-small

πŸ“– API Endpoints

1. Upload Documents

Endpoint: POST /upload

Upload a single file:

curl -X POST http://localhost:3000/upload \
  -F "files=@document.pdf"

Upload multiple files:

curl -X POST http://localhost:3000/upload \
  -F "files=@document1.pdf" \
  -F "files=@document2.txt"

Response:

{
  "context": "ctx-a1b2c3d4",
  "chunks": 15,
  "files": 1
}

Important: Save the context ID - you'll need it to chat with these documents!


2. Chat with Your Documents

Endpoint: POST /chat

curl -X POST http://localhost:3000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "context": "ctx-a1b2c3d4",
    "query": "What is the main topic of this document?"
  }'

Response:

{
  "answer": "The main topic of this document is...",
  "context": [
    "Relevant chunk 1 from your document...",
    "Relevant chunk 2 from your document..."
  ]
}

3. List All Contexts

Endpoint: GET /contexts

curl http://localhost:3000/contexts

Response:

[
  "ctx-a1b2c3d4",
  "ctx-e5f6g7h8",
  "ctx-i9j0k1l2"
]

4. View Context Metadata

Endpoint: GET /context/:name/metadata

curl http://localhost:3000/context/ctx-a1b2c3d4/metadata

Response:

[
  {
    "id": "chunk-uuid-1",
    "context": "ctx-a1b2c3d4",
    "filename": "document.pdf",
    "offset_start": 0,
    "offset_end": 500,
    "text": "First 500 characters of your document..."
  },
  ...
]

πŸ“ Supported File Types

  • βœ… PDF (.pdf) - Uses pdf-parse
  • βœ… Word Documents (.docx) - Uses mammoth
  • βœ… Text Files (.txt)
  • βœ… Markdown (.md)
  • βœ… CSV (.csv)
  • βœ… JSON (.json)
  • βœ… HTML (.html, .htm)

πŸŽ“ Learning Resources

What is RAG?

Retrieval-Augmented Generation is a technique that combines:

  1. Information Retrieval - Finding relevant documents
  2. Language Generation - Using AI to generate answers

This solves the problem of AI hallucination by grounding responses in your actual documents.

Key Concepts

1. Embeddings

  • Text converted to numbers (vectors)
  • Similar text has similar vectors
  • Example: "cat" and "kitten" have similar embeddings

2. Vector Database (Pinecone)

  • Stores embeddings efficiently
  • Searches by similarity, not keywords
  • Example: Search for "climate change" finds "global warming"

3. Chunking

  • Breaking documents into smaller pieces
  • Default: 500 characters with 100 character overlap
  • Overlap ensures context isn't lost at chunk boundaries

4. Semantic Search

  • Searches by meaning, not exact words
  • Example: "How do I reset my password?" matches "Password recovery instructions"

Project Structure

pinecone-js/
β”œβ”€β”€ pinecone.js          # Main API server
β”œβ”€β”€ .env                 # API keys (DO NOT COMMIT)
β”œβ”€β”€ package.json         # Dependencies
β”œβ”€β”€ data/                # Local storage (created automatically)
β”‚   └── ctx-xxxxxxxx/    # Each context has its own folder
β”‚       β”œβ”€β”€ files/       # Original uploaded files
β”‚       └── metadata.json # Chunk metadata
└── README.md            # This file

πŸ”§ Configuration Options

Chunk Size

Control how documents are split:

RAG_CHUNK_SIZE=500      # Characters per chunk
RAG_CHUNK_OVERLAP=100   # Characters that overlap between chunks

When to adjust:

  • Larger chunks (1000+): Better for long, coherent passages
  • Smaller chunks (300-500): Better for precise answers

LLM Model

Choose different AI models from OpenRouter:

# Free options
LLM_MODEL=google/gemini-2.0-flash-exp:free
LLM_MODEL=meta-llama/llama-3.2-3b-instruct:free

# Paid options (better quality)
LLM_MODEL=anthropic/claude-3.5-sonnet
LLM_MODEL=openai/gpt-4o

See all available models

πŸ› Troubleshooting

"Out of credits" error

Problem: OpenRouter API returns 402 error

Solutions:

  1. Use a free model: LLM_MODEL=google/gemini-2.0-flash-exp:free
  2. Add credits at https://openrouter.ai/settings/credits

Upload takes a long time

Problem: Large files take time to process

Why: Each chunk needs to be converted to embeddings via API

Expected times:

  • Small file (1-5 pages): 5-10 seconds
  • Medium file (10-50 pages): 30-60 seconds
  • Large file (100+ pages): 2-5 minutes

"Context not found" error

Problem: Context ID doesn't exist locally

Solution: Upload a new document and use the returned context ID

Empty search results

Problem: Chat returns empty context array

Possible causes:

  1. Wrong context ID
  2. Query is too different from document content
  3. Document didn't upload correctly

Fix: Check metadata endpoint to verify chunks exist

🎯 Use Cases

  • Research Assistant - Upload papers and ask questions
  • Document Q&A - Chat with user manuals, contracts, reports
  • Knowledge Base - Create a searchable company wiki
  • Study Helper - Upload textbooks and get explanations
  • Legal Document Analysis - Query contracts and agreements

πŸ” Security Notes

  • Never commit .env to version control
  • Add .env to .gitignore
  • Keep API keys secret
  • The ./data folder contains uploaded files - protect it

πŸ“š Further Learning

RAG Concepts:

Technologies Used:

🀝 Contributing

Feel free to:

  • Report bugs
  • Suggest features
  • Submit pull requests
  • Improve documentation

πŸ“ License

MIT License - feel free to use this project for learning and commercial purposes.


Questions? Open an issue or check the troubleshooting section above!

Happy RAG building! πŸš€

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors