Discover validated app opportunities from real user needs
An intelligent opportunity detection platform that automatically collects, clusters, and analyzes "I wish there was an app..." posts from across the web β giving you evidence-backed insights on what people actually want built.
MVP complete β all core phases shipped and security-hardened.
| Area | Status |
|---|---|
| Data ingestion pipeline | β Complete |
| NLP extraction + sentiment | β Complete |
| HDBSCAN clustering | β Complete |
| FastAPI backend (21+ endpoints) | β Complete |
| React UI (5 pages, 30+ components) | β Complete |
| JWT authentication + API key auth | β Complete |
| User-owned bookmarks | β Complete |
| Saved searches + alert scheduling | β Complete |
| Redis caching | β Complete |
| Docker Compose orchestration | β Complete |
| CI pipeline (GitHub Actions) | β Complete |
| WCAG 2.2 accessibility | β Complete |
| Security hardened (rate limiting, timing-safe auth) | β Complete |
- Smart Ingestion: Fetches posts from RSS feeds with URL-hash and content-fingerprint deduplication
- AI-Powered Clustering: Groups similar ideas using HDBSCAN + TF-IDF vectorization
- Evidence-Based: Every cluster backed by real user quotes with source links
- Quality Scoring: Automatic assessment of idea specificity and actionability (0β1 scale)
- Dashboard: Key metrics + top clusters at a glance
- Trend Analysis: Time-series charts showing idea growth
- Domain Breakdown: Categorized by productivity, health, finance, etc.
- Sentiment Analysis: VADER-based positive/negative distribution
- Authentication: JWT login with email normalization and rate-limited endpoints
- Bookmarks: Authenticated, user-owned bookmarks persisted to the database
- Saved Searches: Save filter combinations with optional daily/weekly alert digests
- Command Palette:
Cmd+Kuniversal search across pages, clusters, and ideas - Context Menus: Right-click on cards for copy, share, and export actions
- Advanced Filtering: Sort by size, quality, sentiment, or trend
- Export: CSV and JSON export from any view
- Background Workers: Celery task queues for ingestion, processing, clustering, and alerts
- Async API: FastAPI + asyncpg (8Γ faster than sync psycopg2)
- Migrations: Alembic-managed schema versions, never raw DDL
- Monitoring: Flower (Celery), Prometheus metrics endpoint
- UV 0.5+ β
curl -LsSf https://astral.sh/uv/install.sh | sh - Docker Desktop 4.0+ (with Compose V2)
- Make
- 4 GB RAM, 2 GB free disk
git clone https://github.com/yourusername/app-idea-miner.git
cd app-idea-miner
cp .env.example .env
make devThe following services start automatically:
| Service | URL |
|---|---|
| Web UI | http://localhost:3000 |
| API | http://localhost:8000 |
| API Docs (Swagger) | http://localhost:8000/docs |
| Flower (Celery monitor) | http://localhost:5555 |
| PostgreSQL | localhost:5432 |
| Redis | localhost:6379 |
# Seed sample data (20 curated app ideas)
make seed
# Wait ~30 seconds for the worker to process and cluster
# Open the UI
open http://localhost:3000You should see 10β15 clusters with evidence links and quality scores.
βββββββββββββββββββββββββββββββββββββββββββ
β Data Sources β
β RSS Feeds Β· Sample Data Β· (Future APIs)β
ββββββββββββββββββββ¬βββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββ
β Celery Worker β
β Ingestion Β· β
β Processing Β· β
β Clustering Β· β
β Alert Digests β
ββββββββββ¬ββββββββββ
β
ββββββββββΌββββββββββ
β PostgreSQL 16 β
β Redis 7 β
ββββββββββ¬ββββββββββ
β
ββββββββββΌββββββββββ ββββββββββββββββββββ
β FastAPI ββββββββββ React + Vite β
β REST Β· Auth Β· β β TypeScript UI β
β API Key Gate β ββββββββββββββββββββ
ββββββββββββββββββββ
| Layer | Technology |
|---|---|
| API | Python 3.12, FastAPI 0.115, SQLAlchemy 2.0 (async), asyncpg |
| Auth | JWT (python-jose), passlib/bcrypt, per-route rate limiting |
| Workers | Celery 5.4, Redis 7 (broker + result backend) |
| ML | scikit-learn (TF-IDF), HDBSCAN, VADER sentiment, NLTK |
| Database | PostgreSQL 16 (JSONB, full-text search), Alembic migrations |
| Frontend | React 18, TypeScript 5, Vite 6, Tailwind CSS 3, Framer Motion |
| State | TanStack Query 5, Zustand 4 |
| Testing | pytest + pytest-asyncio, Vitest, Playwright |
| Tooling | UV (packages), Ruff (lint/format), mypy (types) |
| Infra | Docker Compose, GitHub Actions CI |
app-idea-miner/
βββ apps/
β βββ api/ # FastAPI backend
β β βββ app/
β β βββ core/ # Auth utilities
β β βββ routers/ # Endpoints (clusters, ideas, bookmarks, auth, β¦)
β β βββ schemas/ # Pydantic request/response schemas
β β βββ services/ # Business logic layer
β βββ worker/ # Celery background tasks
β β βββ tasks/
β β βββ ingestion.py
β β βββ processing.py
β β βββ clustering.py
β β βββ saved_search_alerts.py
β βββ web/ # React frontend
β βββ src/
β βββ components/ # Reusable UI components
β βββ contexts/ # AuthContext
β βββ hooks/ # useFavorites, useKeyboard, β¦
β βββ pages/ # Dashboard, ClusterExplorer, Ideas, Saved, Settings, Login
β βββ services/ # Typed API client
β βββ types/ # Shared TypeScript interfaces
βββ packages/
β βββ core/ # Shared Python (models, clustering, NLP, dedupe)
βββ migrations/ # Alembic versions
βββ tests/ # pytest integration tests
βββ data/
β βββ sample_posts.json # Seed data (20 curated ideas)
βββ docs/ # Architecture, API spec, schema, deployment
βββ infra/ # Dockerfiles, postgres init
βββ Makefile # Dev commands
βββ docker-compose.yml
Copy .env.example to .env and adjust as needed:
# Database
DATABASE_URL=postgresql+asyncpg://postgres:postgres@postgresql:5432/appideas
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=appideas
# Redis
REDIS_URL=redis://redis:6379/0
# Auth
SECRET_KEY=your-secret-key-here
# API
API_HOST=0.0.0.0
API_PORT=8000
CORS_ORIGINS=http://localhost:3000
# Worker
CELERY_BROKER_URL=redis://redis:6379/0
CELERY_RESULT_BACKEND=redis://redis:6379/1
# Frontend
VITE_API_URL=http://localhost:8000
# Data Sources
RSS_FEEDS=https://hnrss.org/newest
FETCH_INTERVAL_HOURS=6
# Clustering
MIN_CLUSTER_SIZE=3
MAX_FEATURES=500All Docker inter-service URLs use service names (
postgresql,redis), notlocalhost.
make dev # Start all services
make down # Stop all services
make logs # Tail all logs
make logs-api # Tail API logs only
make logs-worker # Tail worker logs onlymake migrate # Run pending migrations
make migration name=add_column_x # Generate new migration
make db-reset # Drop and recreate (loses data)
make db-shell # psql shellmake seed # Load sample_posts.json
make ingest # Trigger ingestion task manually
make cluster # Run clustering task manually
make clean-data # Truncate all data tablesmake test # All tests
make test-coverage # With HTML coverage report
make test-file path=tests/test_api_auth.py # Single file
cd apps/web && npm test # Frontend unit tests
cd apps/web && npm run test:e2e # Playwright E2E testsTests marked
@pytest.mark.requires_dbare skipped locally unlessDATABASE_URLpoints to a live Postgres instance. They always run in CI (GitHub Actions provides a Postgres service).
make lint # Ruff linter
make format # Ruff auto-format
cd apps/web && npm run lint # ESLint (0 warnings policy)
cd apps/web && npm run build # TypeScript + Vite build checkFull reference: docs/API_SPEC.md
Authentication options:
- API Key:
X-API-Key: <key>header (server-to-server) - JWT Bearer:
Authorization: Bearer <token>(user sessions)
Key endpoints:
GET /api/v1/clusters List clusters (sort, filter, paginate)
GET /api/v1/clusters/{id} Cluster detail with evidence
GET /api/v1/ideas List ideas (search, filter)
GET /api/v1/analytics/summary Aggregated platform metrics
POST /api/v1/auth/register Create account
POST /api/v1/auth/login Exchange credentials for JWT
GET /api/v1/auth/me Current user info
GET /api/v1/bookmarks User's saved bookmarks
POST /api/v1/bookmarks Bookmark a cluster or idea
DELETE /api/v1/bookmarks/{id} Remove bookmark
GET /api/v1/saved-searches User's saved searches
POST /api/v1/saved-searches Create saved search with alert options
DELETE /api/v1/saved-searches/{id}
POST /api/v1/jobs/ingest Trigger ingestion
POST /api/v1/jobs/cluster Trigger clustering
GET /health Health check
GET /metrics Prometheus metrics
Posts are fetched from RSS feeds on a configurable schedule. Each post is deduplicated using a SHA-256 URL hash and fuzzy title matching before being stored.
Each post is run through:
- Idea extraction: pattern matching for "I wish there was an appβ¦" phrases
- Sentiment analysis: VADER scores (compound, positive, negative, neutral)
- Domain tagging: productivity, health, finance, etc.
- Quality scoring: specificity Γ actionability β 0β1 float
Ideas are grouped using:
- TF-IDF vectorization (500 features, 1β3 grams, L2-normalized)
- HDBSCAN (min_cluster_size=2, euclidean distance) β auto-detects cluster count and handles noise
- Keyword extraction: top-10 TF-IDF terms per cluster
- Quality scoring: silhouette score + average sentiment + source diversity
See docs/CLUSTERING.md for the full algorithm breakdown.
# Backend
make test # All tests (DB tests skip without live Postgres)
make test-coverage # HTML report at htmlcov/index.html
# Frontend
cd apps/web
npm test # Vitest unit tests
npm run test:coverage # With coverage
npm run test:e2e # Playwright smoke + flow testsCI runs both suites on every push. The backend job provides a Postgres 16 service container so all tests (including requires_db) execute in CI.
See docs/DEPLOYMENT.md for full instructions.
The project ships with:
railway.tomlfor Railway deploymentvercel.jsonfor Vercel (frontend)api/directory with a serverless-ready entrypoint
| File | Contents |
|---|---|
docs/ARCHITECTURE.md |
System design decisions |
docs/API_SPEC.md |
Full API reference (21+ endpoints) |
docs/SCHEMA.md |
Database schema and relationships |
docs/CLUSTERING.md |
HDBSCAN algorithm deep dive |
docs/DEPLOYMENT.md |
Production deployment guide |
docs/TESTING.md |
Testing strategy and patterns |
docs/MONITORING.md |
Metrics, alerting, and observability |
- Fork the repo and create a feature branch
- Run
make devto start the stack - Write tests for new behavior
- Ensure
make lintandmake testpass - Open a pull request