Skip to content

Add packages inventory collector#15

Merged
sergeyfast merged 1 commit into
masterfrom
feat/packages-collector
May 14, 2026
Merged

Add packages inventory collector#15
sergeyfast merged 1 commit into
masterfrom
feat/packages-collector

Conversation

@sergeyfast
Copy link
Copy Markdown
Contributor

Summary

  • New internal/topsrv/packages collector reads dpkg/rpm/apk databases pure-Go (no CGo, no shell-out) and exposes counts as topsrv_packages_* Prometheus metrics
  • Pushes full inventory snapshots (Vulners-ready schema: NEVRA, GPG keyid, modularityLabel, vendor, sourceName, sigDigest, licenses, autoInstalled, repoOrigin) to a new /v1/inventory endpoint, parallel to existing /v1/meta for postgres
  • Refactor internal/topsrv/push.go: extract postJSON + spoolFile helpers, generalize endpoint derivation, fold trim into spool — single source of truth for both metrics and inventory transport

Design

  • Concrete-type Manager pattern (*Dpkg / *Rpm / *Apk with methods, no interface — per project philosophy)
  • Detection by marker files (/var/lib/dpkg/status, /var/lib/rpm/*, /lib/apk/db/installed); auto-fallback through rpm sqlite → ndb → bdb
  • Lock-safe DB reads: BDB copy-then-parse (Trivy/Syft pattern), SQLite ?mode=ro&immutable=1 (Trivy pattern)
  • Background scan with ±10% jitter (default 6h) + cached metrics; scrape never blocks on FS walk
  • One HTTP POST per kind on /v1/inventory (gatesrv routes by discriminator)
  • Failure path: spool to SpoolDir/inventory/<kind>-<unixms>.json, trim to pushMaxSpoolSize=100
  • Build-time deps: github.com/anchore/go-rpmdb v0.1.0 + modernc.org/sqlite v1.50.1 (pure-Go)

Docs

  • docs/packages-collector-research.md — industry comparison (Syft / Trivy / osquery / Wazuh), why /v1/inventory not extended /v1/meta
  • docs/packages-collector-implementation.md — full contract, security data model (MUST / SHOULD / NICE per Vulners audit), concurrent-safety analysis, overhead measurements, build-tag plan for slim builds

Test plan

  • make fmt lint test — 0 issues, all green
  • make test-integration — postgres / nginx / angie / smart / botlog integration tests green (no regressions in transport refactor)
  • 6 unit tests for parsers (Rpm.parseSrcRpm, Dpkg.parseSource, Dpkg.parseRFC822, Apk.setChecksum, Rpm.extractKeyID) under //go:build linux, run via OrbStack
  • OrbStack smoke: ubuntu:22.04 (101 dpkg) / rockylinux:9 (142 rpm) / alpine:3 (16 apk) — all detect their manager, scan < 60 ms, metrics on :9100, payload accepted by local gatesrv → /v1/inventory returns 204
  • ClickHouse-side verification on gatesrv: 404 rows ingested, MUST Vulners fields (name/version/arch/status, host.osId/osVersionId) at 100% coverage

Out of scope (Phase 4 / future PRs)

  • kind="repos" snapshot (apt sources.list, yum.repos.d, apk repositories + GPG keys)
  • kind="packageHistory" snapshot (apt history.log, dnf trans, apk.log)
  • Build tag no_rpm_sqlite for slim binary (drops sqlite runtime, -6 MB / -14 MB RSS, loses RHEL 9+ sqlite rpmdb support)
  • Integration tests with on-disk fixtures (testdata/packages/)
  • apt-side repoOrigin reconstruction — analysis recorded in implementation doc; not done by Syft/Trivy/osquery/Wazuh for cost reasons, vendor + kind="repos" cover the security signal
  • topsrv_packages_upgradable metric (depends on CheckUpgrades=true reading local apt/dnf cache — field reserved but not wired)

@sergeyfast sergeyfast force-pushed the feat/packages-collector branch from 8fd35ed to 69fe330 Compare May 14, 2026 21:14
- Add internal/topsrv/packages: Dpkg/Apk/Rpm managers as concrete
  structs with methods (no interface — concrete types are clearer
  per the project's philosophy). detectManagers() returns dispatch
  entries by probing marker files
- Parse dpkg status, apk installed DB, rpm BDB/NDB/SQLite via
  anchore/go-rpmdb v0.1.0; share lock-safe read pattern with Trivy
  (copy-then-parse for BDB, immutable=1 DSN for SQLite)
- Enrich with apt extended_states (autoInstalled), dnf history.sqlite
  (autoInstalled + repoOrigin), /etc/apk/world (autoInstalled);
  cover NEVRA, GPG keyid, modularityLabel, vendor, sigDigest,
  licenses for Vulners CVE matching on gatesrv
- Add /v1/inventory endpoint with kind discriminator; new
  InventoryProvider + InventoryAckReceiver interfaces alongside
  QueryMetaProvider; spool to SpoolDir/inventory/<kind>-<ts>.json
- Expose topsrv_packages_{installed,held,scan_duration_seconds,
  scan_errors_total,last_scan_timestamp_seconds,
  last_push_timestamp_seconds,manager_info}
- Refactor push.go: extract postJSON + spoolFile helpers, replace
  deriveMetaEndpoint with generic deriveEndpoint, fold trimSpool
  inline so all spool subdirs share the cap
- Wire registerPackages in app.go and [Packages] block in
  local.toml.dist; opt-out via Disabled/DisablePush
- Add unit tests for Rpm.parseSrcRpm, Dpkg.parseSource,
  Dpkg.parseRFC822, Apk.setChecksum, Rpm.extractKeyID
- Depend on github.com/anchore/go-rpmdb v0.1.0 and
  modernc.org/sqlite v1.50.1 (pure-Go, no CGo)
@sergeyfast sergeyfast force-pushed the feat/packages-collector branch from 69fe330 to 4fd9ad5 Compare May 14, 2026 21:18
@sergeyfast sergeyfast merged commit f30a0ea into master May 14, 2026
2 checks passed
@sergeyfast sergeyfast deleted the feat/packages-collector branch May 14, 2026 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant