EtherFi GPT

A Custom GPT powered by RAG (Retrieval-Augmented Generation) that provides expert assistance for learning about ether.fi, liquid staking, and EigenLayer restaking.

🌟 Features

Semantic Search: Vector-based search across EtherFi documentation and smart contracts
Interactive Study: Commands like /brief, /deepdive, /walk for different learning styles
Quiz Generation: AI-generated questions from actual protocol documentation
Interactive Learning: Full interactive learning experience with Q/A model and progressive difficulty
Source Citations: Every answer includes links to source documentation

🏗️ Architecture

EtherFi GPT
├── Crawler Module (Milestone 2)
│   ├── GitBook documentation crawler
│   └── GitHub repository crawler
├── Embedding Pipeline (Milestone 3)
│   ├── Text chunking (1200 tokens, 200 overlap)
│   └── OpenAI embeddings (text-embedding-3-large)
├── Vector Database (Milestone 3)
│   ├── Supabase + pgvector
│   └── Cosine similarity search
├── API (Milestone 4)
│   ├── POST /api/search - Semantic search
│   ├── GET /api/browse - Fetch URL content
│   ├── GET /api/version - Crawl timestamps
│   └── POST /api/quiz - Generate quizzes
└── Custom GPT (Milestone 6)
    └── OpenAI GPT Builder with Actions

📊 Current Status

Milestone	Status	Description
1. Project Scaffold	✅ Complete	Directory structure, requirements, FastAPI server
2. Crawler Module	✅ Complete	90 files crawled (49 docs + 42 GitHub files)
3. Chunk + Embed	✅ Code Complete	Chunking, embedding, vector DB ready
4. Retrieval API	✅ Complete	All 4 endpoints implemented
5. OpenAPI Spec	✅ Complete	Full OpenAPI 3.1 specification
6. Custom GPT	✅ Complete	Integration guide and instructions

🚀 Quick Start

Prerequisites

Python 3.10+
OpenAI API key
Supabase account (free tier works)
GitHub token (optional, for re-crawling)

1. Setup Environment

# Clone and navigate
cd etherfi-gpt

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure environment variables
cp .env.example .env
# Edit .env with your API keys:
# - OPENAI_API_KEY
# - SUPABASE_URL
# - SUPABASE_KEY
# - GITHUB_TOKEN (optional)

2. Setup Database

# Execute SQL in Supabase SQL Editor
# Copy contents of storage/setup.sql
# Or run:
python storage/setup_database.py

3. Run Embedding Pipeline

# Test with 2 files
python test_embedding_small.py

# Process all 90 files (~10-15 minutes, ~$0.07 cost)
python embeddings/embed_all.py

4. Start API Server

# Development
uvicorn api.main:app --reload

# Production
uvicorn api.main:app --host 0.0.0.0 --port 8000

# API will be available at:
# - http://localhost:8000
# - Docs: http://localhost:8000/docs
# - Health: http://localhost:8000/health

5. Test Endpoints

# Health check
curl http://localhost:8000/health

# Search
curl -X POST http://localhost:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "eETH vs weETH", "top_k": 5}'

# Version info
curl http://localhost:8000/api/version

# Browse URL
curl "http://localhost:8000/api/browse?url=https://etherfi.gitbook.io/etherfi/"

# Generate quiz
curl -X POST http://localhost:8000/api/quiz \
  -H "Content-Type: application/json" \
  -d '{"topic": "liquid staking", "num_questions": 5}'

📁 Project Structure

etherfi-gpt/
├── api/
│   ├── main.py              # FastAPI application
│   ├── routers.py           # API endpoints
│   └── openapi_actions.yaml # OpenAPI spec for GPT Actions
├── crawler/
│   ├── crawl_docs.py        # GitBook crawler
│   └── crawl_github.py      # GitHub crawler
├── embeddings/
│   └── embed_all.py         # Embedding pipeline
├── storage/
│   ├── vector_db.py         # Vector database operations
│   ├── setup.sql            # Database schema
│   └── setup_database.py    # Setup helper
├── utils/
│   └── chunker.py           # Text chunking
├── data/
│   └── raw/                 # Crawled data (90 files)
│       ├── docs/            # GitBook docs (49 files)
│       └── github/          # GitHub files (42 files)
├── requirements.txt         # Python dependencies
├── .env                     # Environment variables
├── Dockerfile              # Docker configuration
└── README.md               # This file

📚 Documentation

CUSTOM_GPT_SETUP.md - Complete Custom GPT integration guide
MILESTONE_1_VERIFICATION.md - Project scaffold verification
CRAWLER_STATUS.md - Crawler implementation details
MILESTONE_3_STATUS.md - Embedding pipeline status
MILESTONE_3_VERIFICATION.md - Implementation verification

🔧 Development

Re-crawl Data

# Re-crawl GitBook docs
python crawler/crawl_docs.py

# Re-crawl GitHub repos
python crawler/crawl_github.py

Run Tests

# Test chunking
python test_chunking.py

# Test embedding (small sample)
python test_embedding_small.py

# Test API health
python test_health.py

Update Embeddings

# After re-crawling, regenerate embeddings
python embeddings/embed_all.py

🚢 Deployment

See CUSTOM_GPT_SETUP.md for detailed deployment instructions.

Quick Deploy to Fly.io

fly launch
fly secrets set OPENAI_API_KEY="..." SUPABASE_URL="..." SUPABASE_KEY="..."
fly deploy

Quick Deploy to Render

Connect GitHub repo
Set environment variables
Deploy

💰 Cost Breakdown

Development/Testing

OpenAI embeddings: ~$0.07 (one-time for 90 files)
Supabase: Free tier sufficient
Hosting: Free tier available

Production (Monthly)

OpenAI API: $5-20 (depends on usage)
Hosting: $5-10 (Fly.io or Render)
Supabase: Free tier or $25/month

🎯 API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check
`/api/search`	POST	Semantic search
`/api/browse`	GET	Fetch URL content
`/api/version`	GET	Crawl timestamps
`/api/quiz`	POST	Generate quiz questions

🧪 Example Queries

Search for comparisons

POST /api/search
{
  "query": "eETH vs weETH differences",
  "top_k": 8
}

Search with filters

POST /api/search
{
  "query": "staking contract functions",
  "top_k": 5,
  "filters": {
    "repo": ["smart-contracts"]
  }
}

Generate quiz

POST /api/quiz
{
  "topic": "EigenLayer restaking",
  "num_questions": 5
}

🤝 Contributing

This is a private project to learn interactively about EtherFi. If you'd like to contribute:

Ensure all tests pass
Follow existing code style
Update documentation
Test with the Custom GPT

📄 License

🙏 Acknowledgments

ether.fi for the amazing protocol
EigenLayer for restaking infrastructure
OpenAI for GPT-4 and embeddings API
Supabase for vector database hosting

Built with ❤️ for EtherFi

For setup questions, see CUSTOM_GPT_SETUP.md

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
api		api
crawler		crawler
embeddings		embeddings
storage		storage
utils		utils
.env.example		.env.example
.gitignore		.gitignore
CUSTOM_GPT_SETUP.md		CUSTOM_GPT_SETUP.md
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

EtherFi GPT

🌟 Features

🏗️ Architecture

📊 Current Status

🚀 Quick Start

Prerequisites

1. Setup Environment

2. Setup Database

3. Run Embedding Pipeline

4. Start API Server

5. Test Endpoints

📁 Project Structure

📚 Documentation

🔧 Development

Re-crawl Data

Run Tests

Update Embeddings

🚢 Deployment

Quick Deploy to Fly.io

Quick Deploy to Render

💰 Cost Breakdown

Development/Testing

Production (Monthly)

🎯 API Endpoints

🧪 Example Queries

Search for comparisons

Search with filters

Generate quiz

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages