fix: schedule reconnect after parse error during streaming#134
Merged
Conversation
In `ReconnectingRequest::poll_next`, parser errors propagated through `process_bytes(...)?` while in the `Connected` state previously returned the error without scheduling a reconnect. The next poll continued draining the broken response body before finally hitting end-of-body and emitting a spurious `UnexpectedEof` before the reconnect kicked in. Schedule a reconnect on parse errors when reconnect is enabled. The error is still returned to the caller; the only behavioral change is that the next poll cleanly transitions to reconnect rather than draining the broken body. Adds `parser_error_schedules_reconnect_immediately` test verifying that the next stream item after a parse error is the reconnect's `Connected`, not an extra error. Contributes to launchdarkly/rust-server-sdk#116.
c7ae337 to
7917038
Compare
Merged
3 tasks
- Match `Error::InvalidLine(_)` specifically so a regression that produces a different error class is caught. - Replace `tokio::time::timeout(...).ok()` with `.expect(...)` so a flaky timeout produces a clear failure mode rather than a silent "got None" further down. - Bump the timeout to 2 s to absorb slower CI runners. - Drop the decorative `expect_at_least(2)` -- mockito's default `assert_on_drop` is false, so it enforced nothing; the `items[2] == Connected` assertion already requires the reconnect.
6 tasks
The previous docstring told consumers "Do not use the stream after it returned an error!" That instruction contradicts how the stream actually works: most errors are non-terminal, and the reconnect machinery only runs if the consumer keeps polling. Stream termination is signalled by `Poll::Ready(None)`, not by an error. Reword the docstring to describe the real contract: errors are informational, keep polling, the stream ends when poll_next returns None.
keelerm84
approved these changes
May 9, 2026
kinyoklion
added a commit
that referenced
this pull request
May 11, 2026
Three pre-existing reconnect-control gaps surfaced during the multi- agent review of #134 (SDK-2347): 1. `BackoffRetry::change_base_delay` accepted any duration, including `Duration::ZERO`. A server emitting `retry: 0` collapsed the backoff to zero across every reconnect path, producing a tight reconnect loop. Clamp the input to a 1 ms floor. 2. The EOF arm of `ReconnectingRequest::poll_next` unconditionally scheduled a reconnect, even when `reconnect_opts.reconnect` was false. Honor the flag and transition to `StreamClosed` when reconnect is disabled, matching every other error path. 3. The parse-error arm only transitioned state when reconnect was enabled. With reconnect disabled, the parser stayed poisoned and the next poll drained to EOF, where (1) above papered over the bug. Transition to `StreamClosed` so the documented "do not use the stream after error" contract holds. Tests: - `test_change_base_delay_clamps_to_minimum` pins the retry floor. - `parser_error_closes_stream_when_reconnect_disabled` asserts the stream returns `None` after a parse error when reconnect is off. - `eof_closes_stream_when_reconnect_disabled` asserts the stream returns `None` after end-of-body when reconnect is off.
kinyoklion
added a commit
that referenced
this pull request
May 11, 2026
## Summary Stacked on top of #134. Addresses three pre-existing reconnect-control gaps surfaced during the multi-agent review of that PR. 1. **`retry: 0` collapses backoff to zero.** A server emitting `retry: 0` set `BackoffRetry::base_delay` to `Duration::ZERO`, after which every reconnect path (`client.rs:474, 525, 547, 612` and the new parse-error path) computed `next_delay() == 0` and reconnected immediately — a tight loop. Clamps `change_base_delay` to a 1 ms floor. 2. **EOF arm ignored `reconnect_opts.reconnect`.** The body-exhausted branch unconditionally scheduled `WaitingToReconnect` even when reconnect was disabled. Now honors the flag and transitions to `StreamClosed`, matching every other error path. 3. **Parse-error arm with `reconnect=false` left the parser poisoned.** No state transition happened, so the next poll drained the broken body to EOF — where (2) above papered over the bug. Now transitions to `StreamClosed` so the documented \"do not use the stream after error\" contract holds. ## Context Stacked PR — base is the `rl/sdk-2345/parser-error-reconnect-state` branch from #134, **not** main. Will retarget once #134 lands. Tracked in [SDK-2347](https://launchdarkly.atlassian.net/browse/SDK-2347). Predecessor: [SDK-2345](https://launchdarkly.atlassian.net/browse/SDK-2345) / #134. Surfaced from the multi-agent review of #134 (findings 1, 2, and the corresponding suggested follow-ups 4 and 5). All three were pre-existing — #134 only made the parse-error path more visible. ## Test plan - [x] `test_change_base_delay_clamps_to_minimum` (`retry.rs`) — pins the 1 ms floor against `change_base_delay(Duration::ZERO)` and a sub-floor value. - [x] `parser_error_closes_stream_when_reconnect_disabled` (`client.rs`) — asserts the stream emits one `InvalidLine` then `None` when reconnect is off. - [x] `eof_closes_stream_when_reconnect_disabled` (`client.rs`) — asserts the stream emits the event, `Eof`, then `None` when reconnect is off. - [x] Existing `parser_error_schedules_reconnect_immediately` from #134 still passes. - [x] `cargo test` — 63 lib tests + 1 doc test pass. - [x] `cargo fmt --check` clean. [SDK-2347]: https://launchdarkly.atlassian.net/browse/SDK-2347?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
kinyoklion
pushed a commit
that referenced
this pull request
May 11, 2026
🤖 I have created a release *beep* *boop* --- ## [0.17.4](0.17.3...0.17.4) (2026-05-11) ### Bug Fixes * respect reconnect=false and clamp server-supplied retry: 0 ([#135](#135)) ([80c123a](80c123a)) * schedule reconnect after parse error during streaming ([#134](#134)) ([2ac2998](2ac2998)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Low risk: this PR only updates release metadata (version bump and changelog) and does not modify runtime code paths. > > **Overview** > Bumps `eventsource-client` from `0.17.3` to `0.17.4` in `.release-please-manifest.json` and `eventsource-client/Cargo.toml`. > > Updates `eventsource-client/CHANGELOG.md` with the `0.17.4` release notes (reconnect/retry behavior fixes and reconnect scheduling after parse errors). > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit cab73e7. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
kinyoklion
added a commit
to launchdarkly/rust-server-sdk
that referenced
this pull request
May 11, 2026
## Summary The `StreamingDataSource` fetch loop matched any error other than `Eof` (and recoverable HTTP responses) with a catch-all `_ => break`, dropping the spawned task and leaving the data source silently dysfunctional. Because nothing was polling the eventsource-client's stream anymore, its reconnect logic never ran. The `None` branch also broke without notifying the caller. This PR: - Replaces the catch-all `break` with `continue`, so the eventsource-client gets to reconnect on the next poll. A warning log keeps transient errors visible. - Calls `init_complete(false)` in the `None` branch before breaking, so callers waiting on initialization get a signal when the stream is permanently closed. ## Context Surfaced from [#116](#116). That report has two contributing bugs; this PR addresses the rust-server-sdk side. The matching rust-eventsource-client side is in [launchdarkly/rust-eventsource-client#134](launchdarkly/rust-eventsource-client#134) ([SDK-2345](https://launchdarkly.atlassian.net/browse/SDK-2345)). Together they restore the reconnection contract: the eventsource-client owns reconnection, and the SDK keeps polling and trusts it. Tracked in [SDK-2346](https://launchdarkly.atlassian.net/browse/SDK-2346). ## Test plan - [x] New test `streaming_source_recovers_from_non_eof_stream_error` (`launchdarkly-server-sdk/src/data_source.rs`) — sends invalid UTF-8 that triggers `Error::InvalidLine` from the eventsource-client parser and asserts the mock receives at least two requests, proving the SDK kept polling and the underlying client reconnected. - [x] `cargo test --package launchdarkly-server-sdk --lib` — 300 tests pass; no regressions. - [x] `cargo fmt --check` clean. Closes #116. [SDK-2345]: https://launchdarkly.atlassian.net/browse/SDK-2345?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Touches core streaming update loop behavior; a mistake could cause retry storms or incorrect initialization signaling, though scope is limited and covered by new tests. > > **Overview** > Fixes `StreamingDataSource` to **stay alive on non-EOF SSE errors** by continuing the fetch loop (allowing the underlying eventsource client to reconnect) and downgrading the failure to a warning. > > When the event stream is permanently exhausted (`None`), it now **signals initialization failure** via `init_complete(false)` before exiting. Adds tests to verify reconnect attempts on parser errors and stopping/signaling failure on unrecoverable HTTP statuses (e.g. `401`). > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit f1f3cc4. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Every error path in
ReconnectingRequest::poll_nextduring theConnectedstate sets state toWaitingToReconnectbefore returning the error — except for parser errors propagated throughprocess_bytes(...)?. After a parse error, the state stayed atConnectedand the next poll continued draining the (already-broken) response body before finally hitting end-of-body and emitting a spuriousUnexpectedEofbefore the reconnect actually kicked in.This PR schedules a reconnect on parse errors when reconnect is enabled, matching the pattern used by every other error path. The error is still returned to the caller; the only behavioral change is that the next poll cleanly transitions to reconnect rather than draining the broken body.
Before:
Connected,Err(InvalidLine),Err(UnexpectedEof),Connected, …After:
Connected,Err(InvalidLine),Connected, …Context
Surfaced while investigating launchdarkly/rust-server-sdk#116 (
StreamingDataSourcesilently shuts down on stream errors). That report has two contributing bugs; this PR addresses the rust-eventsource-client side. The rust-server-sdk side will be fixed in a follow-up PR — together they restore the reconnection contract: the eventsource-client owns reconnection, the SDK keeps polling and trusts it.Tracked in SDK-2345.
Test plan
parser_error_schedules_reconnect_immediately(eventsource-client/src/client.rs) — verifies the next stream item after a parse error isConnectedfrom the reconnect, not a spuriousUnexpectedEof.cargo test— 60 lib tests + 2 doc tests pass; no regressions.Note
Medium Risk
Changes streaming error-handling semantics by scheduling reconnects on parser failures, which can affect how downstream consumers observe errors and reconnect timing. Scope is small and covered by a new integration-style test, but touches core stream state transitions.
Overview
Fixes reconnection behavior after SSE parse errors during streaming. When
EventParser::process_bytesfails in theConnectedstate, the client now (whenReconnectOptions::reconnectis enabled) transitions toWaitingToReconnectbefore yielding the parse error, avoiding continued draining of a broken response body and the resulting follow-on errors.Updates
Client::streamdocs to clarify that stream errors are non-terminal and that the stream only ends onPoll::Ready(None), and adds a new test (parser_error_schedules_reconnect_immediately) asserting the sequenceConnected -> Err(InvalidLine) -> Connected.Reviewed by Cursor Bugbot for commit ddea87d. Bugbot is set up for automated code reviews on this repo. Configure here.