Skip to content

Varenik-vkusny/HackNU

Repository files navigation

Claude AI Analytics Dashboard

Full-stack analytics platform: scrapes, classifies and visualizes social + growth data about Claude AI.

📖 API documentationAPI_DOCS.md


Quick Start

# 1. Install Python dependencies
pip install -r requirements.txt

# 2. Configure environment
cp .env.example .env
# Edit .env — add GEMINI_API_KEY and YOUTUBE_API_KEY

# 3. Start backend + frontend (Windows)
run_app.bat

# Or manually:
uvicorn app:app --reload --port 8000   # backend
cd frontend && npm install && npm run dev  # frontend → http://localhost:5173

Swagger UI → http://localhost:8000/docs


Architecture

HackNU/
├── app.py              # FastAPI backend (entry point)
├── models.py           # Pydantic schemas — source of truth for API contract
├── pipeline.py         # Scrape → Merge → Classify pipeline
│
├── scrapers/
│   ├── reddit_client.py     # Reddit via MCP (requires Node.js)
│   ├── hn_client.py         # Hacker News via Algolia free API
│   ├── bluesky_client.py    # Bluesky AT Protocol API
│   ├── youtube_scraper.py   # YouTube Data API v3
│   └── producthunt_client.py # ProductHunt GraphQL API
│
├── analysis/
│   ├── growth_metrics.py    # NPM / PyPI / GitHub / Wikipedia / Trends time-series
│   ├── insights.py          # Viral analysis, sentiment, competitor positioning, seeding detection
│   └── official_sources.py  # Anthropic blog / release timeline via Jina reader
│
├── frontend/            # React + Vite + Chart.js dashboard
│   └── src/
│       ├── components/  # Charts, FeedTable, RangeModal, GrowthTrends
│       └── services/    # API client
│
└── data/               # Generated — gitignored
    ├── dataset.json         # Unified post dataset (4800+ posts, all platforms)
    ├── growth_data.json     # NPM, PyPI, GitHub, Wikipedia, Trends time-series
    ├── insights.json        # Pre-computed analytics output
    └── ...                  # Raw per-source CSVs

API Endpoints

Method Path Description
GET /api/posts All posts, paginated + filtered by platform/sentiment/date
GET /api/posts/stats Aggregated stats by platform, sentiment, content type
GET /api/growth-metrics NPM / PyPI / GitHub / Wikipedia / Trends data
GET /api/insights Viral analysis, competitor positioning, seeding detection
GET /api/correlation Weekly unified signal table (npm + wiki + social + trends)
GET /api/signals Chronological timeline of all growth signals
GET /api/features Top Claude features mentioned with weekly breakdown
POST /api/pipeline/run Trigger background scrape + classify
GET /api/pipeline/status Pipeline running state + post count
GET /api/pipeline/progress Live progress (stage, source, posts collected)
GET /health Health check

Data Sources

Platform Method Auth required
Reddit MCP client (Node.js subprocess) No
Hacker News Algolia Search API (free) No
Bluesky AT Protocol API BSKY_HANDLE + BSKY_APP_PASSWORD in .env
YouTube YouTube Data API v3 YOUTUBE_API_KEY in .env
ProductHunt GraphQL API PRODUCTHUNT_API_KEY + PRODUCTHUNT_API_SECRET in .env
NPM / PyPI Public registry APIs No
GitHub REST API No (rate-limited without token)
Wikipedia Wikimedia Metrics API No
Google Trends pytrends No
App Store / Play Store iTunes lookup + google-play-scraper No

Running the Pipeline Manually

# Full run: scrape all platforms → merge → classify with Gemini
python pipeline.py --scrape --merge --classify

# Individual steps
python pipeline.py --scrape      # Scrape Reddit + HN + Bluesky + PH
python pipeline.py --youtube     # YouTube only
python pipeline.py --merge       # Merge all sources into dataset.json
python pipeline.py --classify    # Classify unclassified rows with Gemini
python pipeline.py --historical  # Reddit top/year + top/all (slower, more data)
python pipeline.py --since 2026-01-01  # Only posts after this date

# Regenerate analytics (after new posts collected)
python analysis/growth_metrics.py
python analysis/insights.py

Or trigger via HTTP (runs in background, returns immediately):

curl -X POST http://localhost:8000/api/pipeline/run

Environment Variables

Copy .env.example to .env and fill in:

Variable Required Description
GEMINI_API_KEY Yes Google Gemini API — post classification + query generation
YOUTUBE_API_KEY Yes for YouTube YouTube Data API v3
BSKY_HANDLE Yes for Bluesky Bluesky account handle (e.g. you.bsky.social)
BSKY_APP_PASSWORD Yes for Bluesky Bluesky app password
PRODUCTHUNT_API_KEY Yes for ProductHunt PH developer API key
PRODUCTHUNT_API_SECRET Yes for ProductHunt PH developer API secret

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors