Skip to content

matthewjhunter/ghtraffic

Repository files navigation

ghtraffic

Collects GitHub traffic data (views, clones, referrers, popular paths) for all repositories the authenticated user has push access to, including organization repos. Outputs newline-delimited JSON to stdout, one record per repo per day. Designed to run hourly via cron.

Tools

ghtraffic

Fetches traffic data from the GitHub API and writes NDJSON records to stdout.

ghtraffic [-owner OWNER] [-seen FILE]
Flag Description
-owner Filter repos to this owner/org (optional)
-seen FILE Existing JSONL file for deduplication; today's records are always re-fetched

Authentication uses GITHUB_TOKEN, falling back to gh auth token.

ghpush

Reads ghtraffic NDJSON from stdin and pushes the records to an Umami instance as pageview events using the /api/send endpoint with historical timestamps.

Requires Umami v2.17 or later.

ghpush [-pushed FILE | -pg DSN] [-dry-run] [-init] [-import-json FILE] [-migrate-sqlite FILE]
Flag Description
-pushed FILE SQLite state file; tracks what has been pushed to avoid re-sending on re-run
-pg DSN Postgres DSN for the state store (alternative to -pushed; or set GHPUSH_DATABASE_URL)
-dry-run Print events as JSON to stdout without sending
-init Bootstrap from scratch: ignore push state and push all historical data
-import-json FILE Import a legacy JSON state file into the state store and exit
-migrate-sqlite FILE Copy an existing SQLite state file into the Postgres store (-pg) and exit
-url URL Umami base URL (overrides UMAMI_URL)
-website UUID Umami website UUID (overrides UMAMI_WEBSITE_ID)

Environment variables: UMAMI_URL, UMAMI_WEBSITE_ID, GHPUSH_DATABASE_URL

Usage

# Collect traffic (skip already-seen repo+date pairs, always re-fetch today)
ghtraffic -seen ~/.local/share/ghtraffic/traffic.jsonl \
  >> ~/.local/share/ghtraffic/traffic.jsonl

# Push deltas to Umami
ghpush -pushed ~/.local/share/ghtraffic/pushed.db \
  < ~/.local/share/ghtraffic/traffic.jsonl

Crontab example

0  * * * * GITHUB_TOKEN=... ghtraffic -seen ~/traffic.jsonl >> ~/traffic.jsonl
5  * * * * UMAMI_URL=https://umami.example.com UMAMI_WEBSITE_ID=... ghpush -pushed ~/pushed.db < ~/traffic.jsonl

Container deployment

For a long-running deployment, the scheduler binary runs one collect+push cycle on start and then every INTERVAL_SECONDS, execing ghtraffic (appending to the data file) then ghpush. It is the entrypoint of the published image ghcr.io/matthewjhunter/ghtraffic, built FROM gcr.io/distroless/static (no shell, runs as non-root). Posting to Umami over an internal network address avoids any reverse-proxy IP filtering that would otherwise drop server-originated events.

Variable Default Description
INTERVAL_SECONDS 3600 Cycle period
GHTRAFFIC_OWNERS (unset) Comma-separated owners to collect, e.g. matthewjhunter,infodancer
GHTRAFFIC_TOKEN_<OWNER> Per-owner PAT; <OWNER> is the owner uppercased with non-alphanumerics replaced by _ (e.g. old-school-gamers -> GHTRAFFIC_TOKEN_OLD_SCHOOL_GAMERS)
DATA_FILE /data/traffic.jsonl NDJSON history file (mount a volume here)
BIN_DIR / Directory holding the ghtraffic and ghpush binaries

Each cycle collects every listed owner with its own token (a fine-grained PAT is single-owner), then pushes once. One owner's failure is logged and does not stop the others or the push. If GHTRAFFIC_OWNERS is unset, the scheduler falls back to single-owner mode using GHTRAFFIC_OWNER + GITHUB_TOKEN.

The container also reads ghpush's own env: UMAMI_URL, UMAMI_WEBSITE_ID, and GHPUSH_DATABASE_URL (Postgres push-state).

Umami dashboard notes

Event mapping:

GitHub metric Umami representation
Page views Pageviews to /<owner>/<repo>
Clones Pageviews to /clone/<owner>/<repo>
Referrers Pageviews with Referrer field set
Popular paths Pageviews to the actual GitHub subpath

Unique visitors: Umami deduplicates visitors by IP address. Since all events are pushed from a single server, Umami shows a small fixed visitor count regardless of actual traffic volume. Ignore the visitor metric. Use the pageview count for views and the Pages breakdown filtered to /clone/ for clones — those counts are exact, derived directly from the GitHub traffic API.

About

GitHub traffic data collector for long-term analytics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages