Skip to content

fix(poster): retry on stale-connection i/o timeouts, not just broken pipes#227

Merged
javi11 merged 1 commit into
mainfrom
claude/optimistic-liskov-382e9a
May 11, 2026
Merged

fix(poster): retry on stale-connection i/o timeouts, not just broken pipes#227
javi11 merged 1 commit into
mainfrom
claude/optimistic-liskov-382e9a

Conversation

@javi11
Copy link
Copy Markdown
Owner

@javi11 javi11 commented May 11, 2026

Summary

  • Broaden the poster's stale-pooled-connection detection to cover read i/o timeout errors in addition to broken-pipe / connection-reset. Wrapped net.Error with Timeout()==true and the literal "i/o timeout" string are now treated as retryable stale-connection signals.
  • Bump the in-function retry from 2 to 3 attempts so a second stale pick (common right after a throttle pause that leaves connections idle) also recovers on a fresh pool connection.
  • Improved retry log: now includes attempt number and the previous error to make production diagnosis easier.

Why

Users on the latest nntppool (v4.11.1) were seeing repeated:

after 5 retries: error posting article: nntp: post failed:
news.newshosting.com:563+...: read tcp ...->...:563: i/o timeout

Investigation showed this isn't an nntppool regression — the v4.11.1 bump only added PostHeaders.Date. The real cause is a pre-existing gap in the poster: when a pooled TCP socket sits idle (workers throttling after a post) and the upstream silently closes it, reuse can produce a read i/o timeout rather than a clean broken pipe. The previous isBrokenPipe helper didn't match that case, so the in-function retry path was skipped and every such post burned the outer 5-retry budget.

The postCtx DeadlineExceeded short-circuit at the call site still handles real envelope expiry, so a net.Error timeout reaching the helper can only come from the underlying socket's read deadline — exactly what we want to retry on.

Test plan

  • go test -race -count=1 ./internal/poster/... — passes.
  • New TestIsStaleConnError table covers: EPIPE, ECONNRESET, wrapped EPIPE, wrapped net.Error timeout, "broken pipe" / "connection reset" / "i/o timeout" strings, plus negatives (io.EOF, generic error, non-timeout net.Error, nil).
  • TestIsStaleConnError_RealDeadline exercises a real net.Conn read deadline to defend against future stdlib wrapping changes.
  • Existing "all attempts return broken pipe" test updated for the new 3-attempt cap.
  • (Optional, post-merge) Watch the rate of "after 5 retries: error posting article" log lines — should drop substantially.

…pipes

The poster's in-function retry only matched broken-pipe / connection-reset
errors. When the upstream NNTP server silently closes an idle pooled socket
faster than the pool's IdleTimeout, the next reuse can fail with a read
"i/o timeout" instead. Those errors bypassed the retry and exhausted the
outer 5-retry budget, surfacing as repeated "after 5 retries" failures
after upgrading nntppool.

Rename isBrokenPipe to isStaleConnError and broaden it to also match
wrapped net.Error timeouts and the "i/o timeout" string. Bump the
in-function retry from 2 to 3 attempts so a second stale pick after a
long throttle pause is also tolerated. The postCtx DeadlineExceeded
short-circuit still guards against retrying on real envelope expiry.
@javi11 javi11 merged commit 01826ea into main May 11, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant