Skip to content

marilib: log handovers at the cloud + clock-offset tool#167

Draft
geonnave wants to merge 2 commits into
DotBots:developfrom
geonnave:cloud-handover-logging
Draft

marilib: log handovers at the cloud + clock-offset tool#167
geonnave wants to merge 2 commits into
DotBots:developfrom
geonnave:cloud-handover-logging

Conversation

@geonnave

@geonnave geonnave commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Measuring cross-gateway handover downtime from the per-edge logs is unreliable for
two reasons: each edge stamps its own machine wall clock (the offset between the two
computers is not recoverable from the logs), and NODE_LEFT is a gateway timeout - it
fires ~5 slotframes after the node actually stops uplinking, not at the real
departure. So LEFT->JOIN mixes real handover time with detection lag and an unknown
cross-machine clock skew.

The fix here is host-side only (no firmware, no schema change). The cloud logs
handovers straight from the uplink stream: MarilibCloud already receives every
node's uplinks from every gateway on one clock, so it now tracks each node's serving
gateway and, when it changes, writes a HANDOVER row to log_events.csv with the
downtime (gap since the node's last uplink on the old gateway). That number is
single-clock and timeout-free, so it sidesteps both confounds, and it reuses the
existing event columns (from-gateway + downtime packed into event_tag), so no new
columns. Alongside it, a small mari-clock-offset tool measures the wall-clock offset
between the machines SNTP-style (offset = ((t2-t1)+(t3-t4))/2); run it at the start
and end of a run and record both, to bound skew/drift when correlating the per-edge
logs or as a cross-check on the cloud timeline.

Validated with unit tests: offset ~0 on localhost, and the handover tracker fires
only on real cross-gateway transitions (not same-gateway or first uplink) with
correct from-gateway and downtimes. Draft: not yet validated on the two-gateway
testbed.

geonnave added 2 commits July 3, 2026 19:47
SNTP-style measurement of the wall-clock offset between the edge/cloud
machines, so cross-machine event timing is trustworthy. Run it at the start
and end of a run and record both - that offset record is exactly what was
missing when a two-clock per-edge log could not be reconciled.

AI-assisted: Claude Opus 4.8
The cloud sees every node's uplinks from every gateway on one clock; when a
node's serving gateway changes, log a HANDOVER with the downtime (gap since
its last uplink on the old gateway). Single-clock and timeout-free, unlike
LEFT->JOIN, so it sidesteps both the NODE_LEFT lag and cross-machine skew.
Reuses the existing log_events schema (details in event_tag), no new columns.

AI-assisted: Claude Opus 4.8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant