AI-powered email triage that clusters your inbox and suggests actions.
InboxSherpa syncs your Gmail, groups similar messages using machine learning, and proposes bulk actions (archive promos, snooze deadlines, apply labels) — giving you a daily digest instead of inbox chaos.
# Clone and run with Docker
git clone https://github.com/noah11dahan-cpu/InboxSherpa.git
cd InboxSherpa
cp .env.example .env
docker compose up --buildThen open:
- API: http://localhost:8001/health
- Frontend: http://localhost:3000
Click "Try Demo" on the home page to see it in action with sample data.
┌─────────────────────────────────────────────────────────────────────────┐
│ FRONTEND │
│ Next.js (React 19) │
│ localhost:3000 │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Home │ │ Digest │ │ Cluster │ │ Metrics │ │
│ │ (OAuth) │───▶│ (List) │───▶│ (Detail) │ │ (Stats) │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└────────────────────────────┬────────────────────────────────────────────┘
│ HTTP/JSON
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ BACKEND │
│ FastAPI (Python 3.11) │
│ localhost:8001 │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ API Routes │ │
│ │ /auth/google/* /digest/today /clusters/{id} /actions/apply │ │
│ │ /messages/import /metrics /demo │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────┴─────────────────────────────────────┐ │
│ │ Services │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │
│ │ │ gmail_sync │ │ clustering │ │ suggested_actions │ │ │
│ │ │ (OAuth + │ │ (TF-IDF + │ │ (Rule-based │ │ │
│ │ │ Gmail API) │ │ KMeans) │ │ proposals) │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │
│ │ │ summarizer │ │ action_rules│ │ importer │ │ │
│ │ │ (Heuristic) │ │ (YAML) │ │ (JSON→DB) │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────┘ │
└────────────────────────────┬────────────────────────────────────────────┘
│ asyncpg
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ DATABASE │
│ PostgreSQL 16 (Docker) │
│ localhost:5432 │
│ │
│ ┌────────┐ ┌─────────┐ ┌─────────┐ ┌─────────────────┐ │
│ │ users │ │messages │ │clusters │ │suggested_actions│ │
│ └────────┘ └─────────┘ └─────────┘ └─────────────────┘ │
│ ┌────────┐ ┌─────────┐ ┌─────────────────────────────┐ │
│ │threads │ │gmail_ │ │pipeline_runs (metrics) │ │
│ │ │ │tokens │ │ │ │
│ └────────┘ └─────────┘ └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ WORKER │
│ Background Gmail Sync Loop │
│ │
│ Runs continuously, syncing today's messages for all connected users │
│ Configurable: SYNC_SLEEP_SECONDS, SYNC_TZ, SYNC_MAX_MESSAGES │
└─────────────────────────────────────────────────────────────────────────┘
InboxSherpa uses a two-stage clustering approach:
- Combines message subject + body + snippet into a single text blob
- Applies TF-IDF vectorization (Term Frequency-Inverse Document Frequency)
- Uses English stop words, bigrams, and max 6,000 features
- Small datasets (< 8 messages): Keyword-based bucketing into predefined categories (Promos, School, Bills, Work, Social)
- Large datasets (≥ 8 messages): KMeans clustering with dynamic K selection
- K = √n, clamped to [2, 12]
- Cluster titles generated from top TF-IDF terms
- Summarizer generates bullets from top senders, keywords, and urgency signals
- Action Rules Engine proposes actions based on YAML rules (configurable)
- Everything is stored per (user, digest_date) for daily digests
Messages → TF-IDF → KMeans → Clusters → Summarizer → Suggested Actions
│
└─→ "Promos (12 msgs)" → "Archive all"
└─→ "Bills (3 msgs)" → "Snooze 24h"
1. User connects Gmail via OAuth 2.0
└─→ Tokens encrypted at rest (Fernet)
2. Worker syncs messages (background loop)
└─→ Gmail API → PostgreSQL
3. User requests digest for a date
└─→ Auto-sync if needed → Cluster → Summarize → Propose actions
4. User accepts/rejects actions
└─→ Gmail labels modified → Local status updated
InboxSherpa is designed with privacy in mind:
| Principle | Implementation |
|---|---|
| No auto-archive | Actions are proposed, never executed without explicit user approval |
| Minimal scopes | Gmail read-only + labels modify (no send, no delete) |
| Encrypted tokens | OAuth tokens encrypted at rest with Fernet (AES-128) |
| Delete your data | Full data deletion available (user + messages + tokens) |
| Local-first option | Can run entirely on localhost with Docker |
| No third-party AI | Clustering uses local scikit-learn, no data sent to external LLMs |
gmail.readonly - Read messages and labels
gmail.labels - Create and modify labels
gmail.modify - Modify message labels (archive, etc.)
userinfo.email - Get user's email address
Click "Try Demo" on the home page to:
- Create a demo user with sample inbox data (200+ messages)
- Auto-redirect to the digest page
- See clusters like "School", "Promos", "Calendar", "Bills"
- Try accepting/rejecting suggested actions
No Gmail connection required for demo mode.
# Backend
python -m venv .venv
.venv\Scripts\Activate.ps1 # Windows
source .venv/bin/activate # macOS/Linux
pip install -r requirements.txt
# Start Postgres (via Docker or local install)
docker compose up db -d
# Run migrations
alembic upgrade head
# Start API
uvicorn app.main:app --reload --port 8000
# Frontend (separate terminal)
cd web
npm install
npm run dev# Requires Postgres running
pytest -q
# With coverage
pytest --cov=app --cov-report=term-missingpython -m evals.run # Run evaluation
python -m evals.run --verbose # Per-test detailsruff check . --fix
cd web && npm run lint| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/demo |
POST | Start demo mode (imports sample data) |
/digest/today |
GET | Get digest with clusters for a date |
/clusters/{id} |
GET | Cluster detail with messages |
/actions/apply |
POST | Accept/reject suggested action |
/auth/google/start |
GET | Begin Gmail OAuth flow |
/auth/google/exchange |
POST | Exchange OAuth code for tokens |
/messages/import |
POST | Import JSON messages |
/metrics |
GET | User metrics and stats |
Copy .env.example to .env and configure:
# Required
DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/inboxsherpa
# Gmail OAuth (get from Google Cloud Console)
GOOGLE_CLIENT_ID=your-client-id
GOOGLE_CLIENT_SECRET=your-client-secret
GOOGLE_REDIRECT_URI=http://localhost:3000/api/auth/google/callback
# Security
TOKEN_ENCRYPTION_KEY=base64-encoded-32-byte-key
# Optional
CORS_ALLOW_ORIGINS=http://localhost:3000
ACTION_RULES_PATH=data/action_rules.yaml
SYNC_TZ=America/Montreal
SYNC_MAX_MESSAGES=500
SYNC_SLEEP_SECONDS=60- Gmail OAuth integration
- TF-IDF + KMeans clustering
- Rule-based action suggestions
- Accept/reject actions with Gmail sync
- Background worker for continuous sync
- Metrics and observability
- Load testing (5k messages < 30s)
- AI eval harness
- LLM-powered summarization (optional, privacy-preserving)
- Smart snooze with calendar awareness
- Multi-account support
- Mobile app (React Native)
- Browser extension for quick triage
- Custom clustering rules
- Export/backup functionality
| Layer | Technology |
|---|---|
| Backend | Python 3.11, FastAPI, SQLAlchemy 2.0, asyncpg |
| Frontend | Next.js 16, React 19, TypeScript, Tailwind CSS |
| Database | PostgreSQL 16 |
| ML | scikit-learn (TF-IDF, KMeans) |
| Auth | Google OAuth 2.0 with PKCE |
| Infra | Docker Compose, Alembic migrations |
MIT
Contributions welcome! Please open an issue first to discuss what you'd like to change.
# Fork, clone, branch
git checkout -b feature/your-feature
# Make changes, test
pytest -q
ruff check .
# Commit and PR
git commit -m "Add your feature"
git push origin feature/your-feature