Skip to content

association: give up joining after repeated max-backoff failures#166

Merged
geonnave merged 2 commits into
DotBots:developfrom
geonnave:join-backoff-give-up
Jul 3, 2026
Merged

association: give up joining after repeated max-backoff failures#166
geonnave merged 2 commits into
DotBots:developfrom
geonnave:join-backoff-give-up

Conversation

@geonnave

@geonnave geonnave commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Nodes that failed to join (shared-uplink collisions, or a link/fade problem)
kept retrying the same gateway until the 5 s wall-clock timeout, which left a
long formation tail: a few stragglers taking 5-6 s while the bulk joined under 3 s.

This adds an attempt-based give-up. The node counts consecutive join failures
while already pinned at the max backoff window, and after MARI_BACKOFF_MAX_STREAK
(3) of them it gives up and rescans instead of waiting out the timeout. The 5 s
timeout stays as a wall-clock backstop. Because it is counted in attempts it
adapts to slotframe size automatically (unlike the fixed 5 s); worst case is
about 2 s on the huge schedule. The earlier rescan lets a straggler re-sync on a
fresh gateway/channel rather than hammering a possibly-faded link, which pairs
with the beacon/scan channel hopping already on develop.

Validated on a 100-node join storm: the tail tightens noticeably - p100 drops to
~3.8 s (max 4.8 s) from ~5-6 s, with p95 ~3.0 s and a much smaller spread. A
small p95 cost buys a far more predictable worst case.

MARI_BACKOFF_MAX_STREAK is the tuning knob: set too low it risks premature
rescans under normal join-storm congestion; 3 held up well in testing.

Also included: a testbed LED-color mapping so the current nRF5340-DK gateway
lights the node green (same known-gateway convention already in board.c).

geonnave added 2 commits July 3, 2026 12:01
After MARI_BACKOFF_MAX_STREAK (3) consecutive join failures while already
pinned at the max backoff window, give up and rescan instead of waiting out
the 5 s wall-clock guard (kept as a backstop). Measured in attempts so it
adapts to slotframe size; ~2 s worst case on the huge schedule.

AI-assisted: Claude Opus 4.8
@geonnave

geonnave commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Tested with 100 nodes and results are amazing! Mean join time for 100 nodes, 10 meters apart from gateway, is 3.8 seconds 😎

Screenshot 2026-07-03 at 12 26 15

@geonnave geonnave merged commit e146a41 into DotBots:develop Jul 3, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant