Skip to content

[Claude] [AUTOTEST-466] Log benign setMaxInFlight teardown race at DEBUG#113

Open
sprout-autoagent wants to merge 1 commit into
mainfrom
auto-agent/fix-autotest-466
Open

[Claude] [AUTOTEST-466] Log benign setMaxInFlight teardown race at DEBUG#113
sprout-autoagent wants to merge 1 commit into
mainfrom
auto-agent/fix-autotest-466

Conversation

@sprout-autoagent
Copy link
Copy Markdown

@sprout-autoagent sprout-autoagent commented May 29, 2026

Sentry issue EVERYDAY-POLLER-75Z files a Jira ticket whenever SubConnection.setMaxInFlight logs a SocketException: Socket closed at ERROR. The exception is a benign teardown race: during normal connection churn (nsqd rollout, k8s pod cycling, peer EOF) the dedicated read thread calls Connection.close() — setting isReading = false and closing the socket — while Subscription.distributeMaxInFlight is concurrently calling setMaxInFlightflush() on that same connection. The IOException is already caught, the connection is closed and rebuilt by checkConnections, and no messages are lost (inFlight:0). Only the log level is wrong. This change branches the existing catch block on the isReading liveness flag — the same discriminator the sibling PubConnection.checkConnectionLiveness() already uses — so genuine failures stay at ERROR while the benign race logs at DEBUG and no longer surfaces in Sentry.

Change Summary

  • src/main/java/com/sproutsocial/nsq/SubConnection.java — In the catch (IOException e) block of setMaxInFlight(int, boolean), replaced the unconditional logger.error("setMaxInFlight failed. con:{}", ...) with a branch on isReading: log at ERROR when the connection still believes it is reading (a genuine, actionable failure), and at DEBUG when isReading == false (the read thread already tore the socket down — a benign race). close() is still called unconditionally in both branches, so behaviour is unchanged. No method signatures, control flow, or callers were touched.
  • src/test/java/com/sproutsocial/nsq/SubConnectionTest.java (new) — First unit test in the repo; runs under surefire with no Docker. Builds a SubConnection with mocked Client/Subscription/Subscriber, swaps in an OutputStream whose flush() throws IOException("Socket closed") (reproducing the race), and attaches a Logback ListAppender to the com.sproutsocial.nsq.SubConnection logger. logsErrorWhenFlushFailsWhileStillReading asserts the failure is logged at ERROR when isReading == true and that close() ran; logsDebugWhenFlushFailsAfterConnectionTornDown sets isReading = false and asserts the failure is logged at DEBUG with no ERROR event.

Suggested Assignee

@alex-dylaalex.dyla@sproutsocial.com (Alex Dyla)

Related Links

When the read thread closes the socket concurrently (peer EOF / nsqd
shutdown), SubConnection.setMaxInFlight flush() fails with
"Socket closed". This is already handled and the connection is rebuilt,
but logging it at ERROR causes Sentry to file it as an issue. Branch on
the existing isReading liveness flag: ERROR for genuine failures while
still reading, DEBUG for the benign teardown race.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants