Skip to content

Expose listening ports and remote-peer scope in netstat#12

Merged
sergeyfast merged 2 commits into
masterfrom
feat/netstat-listening-ports
May 13, 2026
Merged

Expose listening ports and remote-peer scope in netstat#12
sergeyfast merged 2 commits into
masterfrom
feat/netstat-listening-ports

Conversation

@sergeyfast
Copy link
Copy Markdown
Contributor

Summary

Two commits, each focused on a single concern and independently revertable.

1. f3154a9 Add netstat listening_ports with scope/process

A new gauge topsrv_netstat_listening_ports{port, family, scope, process} emits one series per active TCP LISTEN socket — the operator-level answer to "what's actually exposed on this host?".

  • classifyAddr buckets the bind address into loopback (127.0.0.0/8 or ::1), private (RFC1918, RFC4193 ULA, RFC6598 CGNAT, link-local), or public (routable address or 0.0.0.0/:: wildcard — worst case). Wildcards default to public so a missing public interface doesn't hide an exposed listener.
  • resolveProcName attaches the owning binary name via a per-scrape PID cache; empty under kernel ACL (run agent as root for full visibility). PID=0 short-circuits without psproc.NewProcess; missing PIDs are negatively cached.
  • Cap at 256 series per scrape so a compromised host with thousands of listeners cannot blow Prometheus cardinality budgets — a single truncation warning fires when hit.
  • Fixes a small leak from Collector hardening: aliases, rollback, PG18, leak #11 (fb9bf8d): postgres integration test still indexed an appNames sub-map as bool after the map[string]time.Time switch.

2. 51ef29b Add UDP listeners and TCP remote_scope label

Extends the same security story to UDP and outbound traffic.

  • listening_ports gains a proto label (tcp / udp). A UDP socket counts as "listening" when it's bound but has no connected peer (empty Raddr) — catches DNS, mDNS, WireGuard, NTP, and unexpected UDP backdoors next to TCP coverage. UDP failures are non-fatal: a log line is emitted and TCP results still ship.
  • topsrv_netstat_tcp_connections gains a remote_scope label that classifies the peer address with the same loopback/private/public taxonomy. LISTEN sockets carry remote_scope=none. Lets dashboards alert on:
    • public inbound (scan exposure)
    • private→public outbound (exfil signal)

Measured overhead

Bench on Linux (orbstack docker, arm64):

Op Cost
Full Collect() 251 µs
psnet.Connections("tcp") alone 200 µs
Net new code ~50 µs
resolveProcName cold 15 µs per PID
resolveProcName cached 2.77 ns

UDP adds one more Connections call (~200 µs); remote_scope adds ~200 ns per non-LISTEN connection. On a host with 333 TCP conns the parse overhead is ~66 µs; on 10K conns ~2 ms — still <0.01% of a 30 s scrape budget.

Test plan

  • make fmt lint test — 0 lint issues, all packages green
  • make test-integration — full docker-compose stack passes (TestIntegrationPostgres runs against PG17)
  • Unit tests: TestClassifyAddr (27 cases: loopback × 3, private including RFC1918/CGNAT/ULA/link-local × 13, public × 8, garbage × 2, with just-outside boundaries), TestResolveProcName{PIDZero, MissingPID} (short-circuits and negative caching), TestNetstatCollectorListenPorts (live label/value contract)
  • Benchmarks: BenchmarkNetstatCollect, BenchmarkNetstatConnectionsOnly, BenchmarkResolveProcName{cold, cached}
  • Live manual verification on macOS — UDP nc listener on 35353 detected as proto="udp", process="nc", scope="public" alongside real system UDP (rapportd, WireGuard, mDNS); remote_scope splits TCP into four buckets (none, loopback, private, public)
  • Live manual verification on Linux (orbstack ubuntu under --privileged --pid=host --network=host) — TCP listeners show correct scope, orbstack-agent resolved as process, IPv6 link-local correctly classified as private
  • Smoke-check on production host after deploy: confirm topsrv_netstat_listening_ports{scope="public"} matches ss -tlnp / ss -ulnp ground truth

Out of scope / follow-ups

  • Root-bound process inventory as a security audit signal — same process_* collector, gauge of processes running as uid=0.
  • eBPF-based connection tracking for high-fidelity exfil detection.
  • maxListenSeries=256 is hardcoded — if a real host genuinely exceeds it (think container hosts running many sidecars) we'll learn from the truncation warning and revisit.

- New gauge topsrv_netstat_listening_ports{port, family, scope,
  process} emits one series per active TCP LISTEN socket so
  operators can answer "what is actually exposed?" at a glance
- classifyAddr buckets the bind address into loopback, private
  (RFC1918, RFC4193 ULA, RFC6598 CGNAT, link-local), or public
  (routable address or 0.0.0.0/:: wildcard — worst case)
- resolveProcName attaches the owning binary name via a per-scrape
  PID cache; empty under kernel ACL (run agent as root for full
  visibility), with negative caching for missing PIDs and PID=0
- Cap emission at 256 series per scrape so a compromised host
  with thousands of listeners cannot blow Prometheus cardinality
  budgets; log a single truncation warning when hit
- Cover scope buckets, RFC1918/CGNAT just-outside boundaries,
  PID=0 short-circuit, and missing-PID negative caching with unit
  tests; add three benchmarks (full collect / Connections-only /
  proc-name cold+cached) measuring 251us total on Linux of which
  Connections=200us and the new code adds ~50us
- Document the metric, scope semantics, and PromQL recipes (drift
  alert, public-listener inventory) in docs/metrics.md; mention
  listening_ports in the README Features row
- Fix postgres integration test that still indexed an appNames
  sub-map as a bool after the recent map[string]time.Time switch
- listening_ports now carries a proto label (tcp|udp) and emits one
  series per UDP listener too. A UDP socket counts as "listening"
  when it is bound but has no connected peer (empty Raddr) — this
  catches DNS, mDNS, WireGuard, NTP, and unexpected UDP backdoors
  next to the existing TCP coverage
- UDP failures are non-fatal: a log line is emitted and TCP results
  still ship, so a kernel quirk on one protocol can't disable both
- topsrv_netstat_tcp_connections gains a remote_scope label that
  classifies the peer address with the same loopback/private/public
  taxonomy. LISTEN sockets carry remote_scope=none. Dashboards can
  now alert on public-inbound (scan exposure) or private->public
  outbound (exfil signal) without per-IP series cardinality
- Update help-strings, README Features row, and PromQL recipe
  block in docs/metrics.md with UDP and remote_scope examples
- Cover proto label and remote_scope=none-on-LISTEN invariants in
  the existing live-host tests; classification reuses classifyAddr
  so the 27-case unit table already validates the bucket logic
@sergeyfast sergeyfast merged commit 6392a36 into master May 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant