Skip to content

carrier: recover faster after local network outages#137

Closed
poulcarlsen53 wants to merge 1 commit into
Kianmhz:mainfrom
poulcarlsen53:local-network-recovery
Closed

carrier: recover faster after local network outages#137
poulcarlsen53 wants to merge 1 commit into
Kianmhz:mainfrom
poulcarlsen53:local-network-recovery

Conversation

@poulcarlsen53
Copy link
Copy Markdown
Contributor

Summary

  • classify local offline dial failures from http.Client.Do separately from relay/server failures
  • apply a short 15s local-offline endpoint backoff instead of escalating into the normal 30m/1h tiers
  • add a quota-free TCP recovery probe to google_host:443 that clears only local-offline transient blacklists when the network returns
  • preserve fail-fast drained-batch behavior so apps reconnect with fresh TCP sessions after outages

Why

For 24/7 mobile clients, airplane mode or a local blackout should not leave all Apps Script endpoints stuck in a long penalty box after connectivity returns. This keeps quota/server errors on the existing path while making local network recovery much faster.

Verification

  • go test -count=1 ./internal/carrier -run LocalNetwork|RecoveryProbe|PollOnceMarksOnlyDoErrors
  • go test -count=1 ./...
  • go vet ./...

Kianmhz pushed a commit that referenced this pull request May 20, 2026
Classify dial errors from http.Client.Do separately from relay/server
failures and route them through a short 15s local-offline blacklist
instead of the 30m/1h endpoint penalty tiers. A quota-free TCP probe
to google_host:443 clears those transient backoffs when the network
returns, so airplane-mode / captive-portal / brief blackout recovery
happens in seconds rather than minutes.

Drops Linux-only syscall.ENONET from the errno set so the carrier builds
on macOS/Windows; ENONET ("machine is not on the network") is already
covered by the message-substring fallback in isLocalNetworkOffline.

The HTTP 500 mock in TestPollOnceMarksOnlyDoErrors... now sets
ContentLength: -1 so the post-#138 readRelayResponseBody treats the
body as unknown-length and reads it through (rather than honoring
the zero-valued ContentLength as "empty body").

Co-authored-by: poulcarlsen53 <poulcarlsen53@gmail.com>
@Kianmhz
Copy link
Copy Markdown
Owner

Kianmhz commented May 20, 2026

Merged manually as 347489f after resolving a textual conflict with #138 (both PRs touched the same alignment line in client.go) and dropping Linux-only syscall.ENONET from the errno set — it was breaking the macOS build. ENONET ("machine is not on the network") is already covered by the message-substring fallback in isLocalNetworkOffline, so dropping it has no behavioral impact on Linux. Also bumped the HTTP 500 test mock's ContentLength to -1 so the post-#138 readRelayResponseBody reads the body through (rather than honoring the zero-valued ContentLength as "empty body"). Thanks!

@Kianmhz Kianmhz closed this May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants