marilib: log handovers at the cloud + clock-offset tool#167
Draft
geonnave wants to merge 2 commits into
Draft
Conversation
SNTP-style measurement of the wall-clock offset between the edge/cloud machines, so cross-machine event timing is trustworthy. Run it at the start and end of a run and record both - that offset record is exactly what was missing when a two-clock per-edge log could not be reconciled. AI-assisted: Claude Opus 4.8
The cloud sees every node's uplinks from every gateway on one clock; when a node's serving gateway changes, log a HANDOVER with the downtime (gap since its last uplink on the old gateway). Single-clock and timeout-free, unlike LEFT->JOIN, so it sidesteps both the NODE_LEFT lag and cross-machine skew. Reuses the existing log_events schema (details in event_tag), no new columns. AI-assisted: Claude Opus 4.8
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Measuring cross-gateway handover downtime from the per-edge logs is unreliable for
two reasons: each edge stamps its own machine wall clock (the offset between the two
computers is not recoverable from the logs), and NODE_LEFT is a gateway timeout - it
fires ~5 slotframes after the node actually stops uplinking, not at the real
departure. So LEFT->JOIN mixes real handover time with detection lag and an unknown
cross-machine clock skew.
The fix here is host-side only (no firmware, no schema change). The cloud logs
handovers straight from the uplink stream: MarilibCloud already receives every
node's uplinks from every gateway on one clock, so it now tracks each node's serving
gateway and, when it changes, writes a HANDOVER row to log_events.csv with the
downtime (gap since the node's last uplink on the old gateway). That number is
single-clock and timeout-free, so it sidesteps both confounds, and it reuses the
existing event columns (from-gateway + downtime packed into event_tag), so no new
columns. Alongside it, a small mari-clock-offset tool measures the wall-clock offset
between the machines SNTP-style (offset = ((t2-t1)+(t3-t4))/2); run it at the start
and end of a run and record both, to bound skew/drift when correlating the per-edge
logs or as a cross-check on the cloud timeline.
Validated with unit tests: offset ~0 on localhost, and the handover tracker fires
only on real cross-gateway transitions (not same-gateway or first uplink) with
correct from-gateway and downtimes. Draft: not yet validated on the two-gateway
testbed.