Full-stack analytics platform: scrapes, classifies and visualizes social + growth data about Claude AI.
📖 API documentation → API_DOCS.md
# 1. Install Python dependencies
pip install -r requirements.txt
# 2. Configure environment
cp .env.example .env
# Edit .env — add GEMINI_API_KEY and YOUTUBE_API_KEY
# 3. Start backend + frontend (Windows)
run_app.bat
# Or manually:
uvicorn app:app --reload --port 8000 # backend
cd frontend && npm install && npm run dev # frontend → http://localhost:5173Swagger UI → http://localhost:8000/docs
HackNU/
├── app.py # FastAPI backend (entry point)
├── models.py # Pydantic schemas — source of truth for API contract
├── pipeline.py # Scrape → Merge → Classify pipeline
│
├── scrapers/
│ ├── reddit_client.py # Reddit via MCP (requires Node.js)
│ ├── hn_client.py # Hacker News via Algolia free API
│ ├── bluesky_client.py # Bluesky AT Protocol API
│ ├── youtube_scraper.py # YouTube Data API v3
│ └── producthunt_client.py # ProductHunt GraphQL API
│
├── analysis/
│ ├── growth_metrics.py # NPM / PyPI / GitHub / Wikipedia / Trends time-series
│ ├── insights.py # Viral analysis, sentiment, competitor positioning, seeding detection
│ └── official_sources.py # Anthropic blog / release timeline via Jina reader
│
├── frontend/ # React + Vite + Chart.js dashboard
│ └── src/
│ ├── components/ # Charts, FeedTable, RangeModal, GrowthTrends
│ └── services/ # API client
│
└── data/ # Generated — gitignored
├── dataset.json # Unified post dataset (4800+ posts, all platforms)
├── growth_data.json # NPM, PyPI, GitHub, Wikipedia, Trends time-series
├── insights.json # Pre-computed analytics output
└── ... # Raw per-source CSVs
| Method | Path | Description |
|---|---|---|
GET |
/api/posts |
All posts, paginated + filtered by platform/sentiment/date |
GET |
/api/posts/stats |
Aggregated stats by platform, sentiment, content type |
GET |
/api/growth-metrics |
NPM / PyPI / GitHub / Wikipedia / Trends data |
GET |
/api/insights |
Viral analysis, competitor positioning, seeding detection |
GET |
/api/correlation |
Weekly unified signal table (npm + wiki + social + trends) |
GET |
/api/signals |
Chronological timeline of all growth signals |
GET |
/api/features |
Top Claude features mentioned with weekly breakdown |
POST |
/api/pipeline/run |
Trigger background scrape + classify |
GET |
/api/pipeline/status |
Pipeline running state + post count |
GET |
/api/pipeline/progress |
Live progress (stage, source, posts collected) |
GET |
/health |
Health check |
| Platform | Method | Auth required |
|---|---|---|
| MCP client (Node.js subprocess) | No | |
| Hacker News | Algolia Search API (free) | No |
| Bluesky | AT Protocol API | BSKY_HANDLE + BSKY_APP_PASSWORD in .env |
| YouTube | YouTube Data API v3 | YOUTUBE_API_KEY in .env |
| ProductHunt | GraphQL API | PRODUCTHUNT_API_KEY + PRODUCTHUNT_API_SECRET in .env |
| NPM / PyPI | Public registry APIs | No |
| GitHub | REST API | No (rate-limited without token) |
| Wikipedia | Wikimedia Metrics API | No |
| Google Trends | pytrends |
No |
| App Store / Play Store | iTunes lookup + google-play-scraper | No |
# Full run: scrape all platforms → merge → classify with Gemini
python pipeline.py --scrape --merge --classify
# Individual steps
python pipeline.py --scrape # Scrape Reddit + HN + Bluesky + PH
python pipeline.py --youtube # YouTube only
python pipeline.py --merge # Merge all sources into dataset.json
python pipeline.py --classify # Classify unclassified rows with Gemini
python pipeline.py --historical # Reddit top/year + top/all (slower, more data)
python pipeline.py --since 2026-01-01 # Only posts after this date
# Regenerate analytics (after new posts collected)
python analysis/growth_metrics.py
python analysis/insights.pyOr trigger via HTTP (runs in background, returns immediately):
curl -X POST http://localhost:8000/api/pipeline/runCopy .env.example to .env and fill in:
| Variable | Required | Description |
|---|---|---|
GEMINI_API_KEY |
Yes | Google Gemini API — post classification + query generation |
YOUTUBE_API_KEY |
Yes for YouTube | YouTube Data API v3 |
BSKY_HANDLE |
Yes for Bluesky | Bluesky account handle (e.g. you.bsky.social) |
BSKY_APP_PASSWORD |
Yes for Bluesky | Bluesky app password |
PRODUCTHUNT_API_KEY |
Yes for ProductHunt | PH developer API key |
PRODUCTHUNT_API_SECRET |
Yes for ProductHunt | PH developer API secret |