fix(poster): retry on stale-connection i/o timeouts, not just broken pipes#227
Merged
Conversation
…pipes The poster's in-function retry only matched broken-pipe / connection-reset errors. When the upstream NNTP server silently closes an idle pooled socket faster than the pool's IdleTimeout, the next reuse can fail with a read "i/o timeout" instead. Those errors bypassed the retry and exhausted the outer 5-retry budget, surfacing as repeated "after 5 retries" failures after upgrading nntppool. Rename isBrokenPipe to isStaleConnError and broaden it to also match wrapped net.Error timeouts and the "i/o timeout" string. Bump the in-function retry from 2 to 3 attempts so a second stale pick after a long throttle pause is also tolerated. The postCtx DeadlineExceeded short-circuit still guards against retrying on real envelope expiry.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
net.ErrorwithTimeout()==trueand the literal"i/o timeout"string are now treated as retryable stale-connection signals.Why
Users on the latest nntppool (v4.11.1) were seeing repeated:
Investigation showed this isn't an nntppool regression — the v4.11.1 bump only added
PostHeaders.Date. The real cause is a pre-existing gap in the poster: when a pooled TCP socket sits idle (workers throttling after a post) and the upstream silently closes it, reuse can produce a readi/o timeoutrather than a clean broken pipe. The previousisBrokenPipehelper didn't match that case, so the in-function retry path was skipped and every such post burned the outer 5-retry budget.The
postCtxDeadlineExceededshort-circuit at the call site still handles real envelope expiry, so anet.Errortimeout reaching the helper can only come from the underlying socket's read deadline — exactly what we want to retry on.Test plan
go test -race -count=1 ./internal/poster/...— passes.TestIsStaleConnErrortable covers:EPIPE,ECONNRESET, wrappedEPIPE, wrappednet.Errortimeout, "broken pipe" / "connection reset" / "i/o timeout" strings, plus negatives (io.EOF, generic error, non-timeoutnet.Error, nil).TestIsStaleConnError_RealDeadlineexercises a realnet.Connread deadline to defend against future stdlib wrapping changes."after 5 retries: error posting article"log lines — should drop substantially.