osv-compare

A Go CLI for surfacing notable, specific upstream changes in the OSV vulnerability database. It pulls the last N days of per-ecosystem snapshots from the public gs://osv-vulnerabilities bucket via GCS object versioning, then reports schema drift, JSON path-set diffs, volume changes, and record churn between days.

It exists to make upstream changes: new ecosystems, new fields, schema-version bumps, volume jumps, churn spikes. Easy to spot and attribute to a specific date and ecosystem.

How it works

OSV publishes per-ecosystem zips at gs://osv-vulnerabilities/<Ecosystem>/all.zip. The bucket has object versioning enabled, and OSV republishes each zip roughly every 15 minutes. That means the noncurrent-version history is a fine-grained time-series of the feed.

osv-compare walks that history:

Discover ecosystems: list top-level prefixes in the bucket (~105 ecosystems including PyPI, npm, Debian:12, Alpine:v3.20, ...) or use --ecosystem to restrict.
List versions:for each ecosystem's all.zip, list noncurrent versions and pick the latest version per UTC day in the window.
Download: pull each picked version (pinned by generation) into cache/<ecosystem>/<YYYY-MM-DD>.zip. Cached zips are reused on re-run unless --no-cache is passed. Downloads run with bounded concurrency (--concurrency, default 8).
Analyse: stream every JSON record out of each zip (no decompression to disk) and accumulate, per ecosystem-day:
- Record count: how big this snapshot is.
- schema_version distribution: coarse OSV spec drift (e.g. 1.6.0=42, 1.7.0=18).
- JSON path set: every distinct path observed across all records, e.g. affected[].ranges[].events[].introduced. Captures field shape, not values.
- Per-record content hash (canonical-JSON SHA-256, keyed by id): for churn detection.
Diff: between consecutive days: which paths appeared/disappeared, which records were added/removed/changed, and overall churn %.
Report: write out/report.md (markdown with a "Top suspects" section + per-ecosystem tables) and out/diffs/<Ecosystem>.json (full machine-readable diffs, ID lists capped at 1000 each).

What "Top suspects" flags

The markdown report leads with ecosystems that triggered any of:

Volume jump: record count changed >20% day-over-day.
New paths: a JSON path appeared that wasn't there the previous day.
New schema_version: a new value in the schema_version field.
High churn: >5% of records added/removed/changed in a single day.

Layout

cmd/osv-compare/main.go        cobra CLI, orchestration
internal/gcs/                  versioned listing, daily-version picker, downloads
internal/snapshot/             zip → JSON record iterator, canonical hashing
internal/analyze/              DaySummary accumulator + Diff
internal/report/               markdown + per-ecosystem JSON writers

Setup

Prerequisites

Go 1.22+ (built/tested with 1.26)
gcloud CLI: install via brew install --cask google-cloud-sdk or the docs
A Google account that can authenticate (any account works, the bucket is public, but listing versioned objects requires an authenticated identity).

One-time auth

gcloud auth application-default login

This opens a browser, signs you in, and writes Application Default Credentials to ~/.config/gcloud/application_default_credentials.json. The CLI picks them up automatically.

Build

git clone <this-repo> && cd osv-data-compare
go build -o osv-compare ./cmd/osv-compare

(Or run directly: go run ./cmd/osv-compare ...)

Usage

Default, last 7 days, all ecosystems

./osv-compare

This downloads ~7 × 105 ≈ 735 zips (most are small per-ecosystem subsets, a few hundred MB total) and writes the report to ./out/report.md. First run takes a few minutes; subsequent runs reuse the cache and complete in seconds.

Restrict to specific ecosystems

./osv-compare --ecosystem PyPI --ecosystem npm --days 14

All flags

Flag	Default	Purpose
`--days N`	`7`	How many UTC days back to pull. Must be ≥ 2 to compute diffs.
`--ecosystem E`	(all)	Restrict to one ecosystem. Repeatable.
`--concurrency N`	`8`	Parallel downloads.
`--cache-dir PATH`	`./cache`	Where downloaded zips live. Safe to delete; will re-download.
`--out-dir PATH`	`./out`	Where `report.md` and `diffs/` are written.
`--no-cache`	`false`	Force re-download even if cached zips exist.

Output

After a successful run:

out/
  report.md                    # human-facing triage report
  diffs/
    PyPI.json                  # full per-day path-sets and ID-level churn
    npm.json
    ...
cache/
  PyPI/
    2026-04-23.zip             # reused on subsequent runs
    2026-04-24.zip
    ...

Open out/report.md first. The "Top suspects" section at the top is the triage starting point. Drill into per-ecosystem tables and JSON diffs from there.

Tests

go test ./...

Covers:

gcs.PickDailyVersions: daily-version picker with gap days, out-of-window versions, multiple-per-day.
snapshot.Iterate: record streaming + ignoring non-JSON entries.
snapshot.RecordHash: stability under key reordering.
analyze.walkPaths: path emission for nested objects, arrays, leaves.
analyze.Diff: added/removed/changed IDs, added/removed paths, churn %.

End-to-end smoke test (real GCS, free reads):

./osv-compare --days 2 --ecosystem PyPI --ecosystem npm

Then check that out/report.md shows two date rows for both ecosystems with non-zero record counts, and re-running the same command logs 0 new, N from cache.

Limitations / out of scope

Does not compare against the OSV schema spec itself. Drift is inferred from observed fields in published records.
Not a monitoring system; re-run on demand. (See --days for backfill.)
Anonymous GCS auth is not supported by default; if you need to run unattended in CI, use a service account JSON via GOOGLE_APPLICATION_CREDENTIALS.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cmd/osv-compare		cmd/osv-compare
internal		internal
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

osv-compare

How it works

What "Top suspects" flags

Layout

Setup

Prerequisites

One-time auth

Build

Usage

Default, last 7 days, all ecosystems

Restrict to specific ecosystems

All flags

Output

Tests

Limitations / out of scope

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

osv-compare

How it works

What "Top suspects" flags

Layout

Setup

Prerequisites

One-time auth

Build

Usage

Default, last 7 days, all ecosystems

Restrict to specific ecosystems

All flags

Output

Tests

Limitations / out of scope

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages