Skip to content

refactor(sflow): re-load or re-register IfCounterCycler on future rescenario swap #147

@indigo423

Description

@indigo423

Context

After #146, MetricsCycler.ifCounters is an atomic.Pointer[IfCounterCycler]. SNMP readers (findResponse, findNextOID, overrideIfHC) each Load() the pointer at the top of the call and operate on the snapshot — so a future runtime swap (e.g. "reset counters", "re-scenario") is safe on the SNMP side.

The sFlow counter-sample path is not yet swap-safe. flow_exporter.registerSFlowCounterSources captures the cycler pointer once at device attach time:

if device.metricsCycler != nil {
    if ic := device.metricsCycler.ifCounters.Load(); ic != nil {
        if src := NewInterfaceCounterSource(ic); src != nil {
            sources = append(sources, src)
        }
    }
}

InterfaceCounterSource then holds the specific *IfCounterCycler for the lifetime of the sFlow exporter. If a future reset endpoint replaces the device's cycler via Store, SNMP will see cycler B (via fresh Loads on every request) but the sFlow counter_sample body path will keep emitting from cycler A — violating the "SNMP and sFlow agree byte-for-byte at the same instant" invariant documented in CLAUDE.md.

Proposal

Two reasonable shapes. Pick when the reset endpoint lands:

Option A — hold the atomic.Pointer in the source

Change InterfaceCounterSource to hold the *MetricsCycler (or the *atomic.Pointer[IfCounterCycler]) instead of the dereferenced *IfCounterCycler. Snapshot() calls Load() once at the top of each tick. After a swap, the next tick naturally picks up cycler B.

Trade-off: one extra atomic Load per tick (cheap). No new hook from the reset endpoint.

Option B — re-register on swap

Keep InterfaceCounterSource holding the concrete cycler. Have the reset endpoint walk every device and call registerSFlowCounterSources(device) again after Store. Old source stays alive until the next Snapshot() call is dropped.

Trade-off: explicit hook from the reset path; easier to reason about but more coupling.

Option A is lighter and keeps the atomic-pointer contract consistent across the codebase. Recommend Option A once #144 / #146's underlying motivation (a runtime reset / rescenario endpoint) is actually implemented.

Acceptance

  • Whichever option: after calling Store with a new *IfCounterCycler, the next FlowExporter.Tick must emit counter-sample values that match concurrent SNMP GETs on the same device (same byte-for-byte agreement invariant).
  • Add a targeted test that simulates a Store mid-run and asserts the invariant.

Out of scope

  • Adding the reset endpoint itself (still deferred — this issue is about keeping the sFlow plumbing consistent with the SNMP plumbing when that endpoint lands).

Discovered during code review of #146.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions