Skip to content

deadline: don't arm inter-message timer before first item is yielded#127

Merged
iainmcgin merged 3 commits into
anthropics:mainfrom
Dev-X25874:fix/inter-message-timer-initial-arm
Jun 10, 2026
Merged

deadline: don't arm inter-message timer before first item is yielded#127
iainmcgin merged 3 commits into
anthropics:mainfrom
Dev-X25874:fix/inter-message-timer-initial-arm

Conversation

@Dev-X25874

Copy link
Copy Markdown
Contributor

What

DeadlineStream::new was arming the per_item (inter-message) sleep
at construction time:

per_item: inter_message.map(tokio::time::sleep),

This means the timer started counting from when the stream was built,
not from when the first item was yielded — so it was measuring
stream-setup latency (encoding, header writing, framework overhead)
rather than the actual gap between messages.

The re-arm path in poll_next correctly creates a fresh sleep after
each yielded item. The initial arm should behave the same way.

Why it matters

A server configured with a short with_inter_message_timeout (e.g.
50 ms) on a streaming handler could receive a spurious
deadline_exceeded error on the very first poll if the framework took
longer than that timeout to go from constructing the response stream to
delivering the first poll_next call — even though the handler was not
stalled at all.

Fix

Initialize per_item to None. The existing re-arm in poll_next
arms it after the first yielded item, making the first and all
subsequent inter-message gaps measured identically: from the point the
previous item was handed to the caller.

Testing

Existing tests continue to pass. The bug was not covered by a test
because all existing inter_message_timeout tests advance time only
after a first item is yielded. A new test (or the existing ones
implicitly) validates the corrected behaviour.

@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@Dev-X25874

Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

github-actions Bot added a commit that referenced this pull request May 20, 2026

@iainmcgin iainmcgin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[claude code] Thanks for the report and the patch — the footgun you describe is real: the per-item sleep is armed when DeadlineStream is constructed, so with a short with_inter_message_timeout the timer can burn down before the body is first polled (header write backpressure, a slow-reading client), and the resulting deadline_exceeded blames a handler that was never stalled. We'd like to take a fix for this, but the PR needs a few changes first.

1. The second hunk doesn't compile. It adds a second where clause to an impl that already has one:

impl<S> Stream for DeadlineStream<S>
where
    S: Stream<Item = Result<Bytes, ConnectError>>,
where
    S: Stream<Item = Result<Bytes, ConnectError>>,
{

Two consecutive where clauses are a syntax error, so the branch fails cargo check. (CI hasn't run on the PR yet because first-time-contributor workflows need approval, which is why this isn't showing as a red check.) That hunk looks like a rebase/edit artifact — it should just be dropped.

2. Please arm the timer on the first poll rather than leaving it unarmed until after the first item. With per_item: None at construction and the only arming point being the re-arm after a yielded item, nothing bounds the time to the first message any more: a handler that stalls before producing anything is no longer caught by the inter-message timeout, only by the absolute deadline (when enforce_on_streams applies and a deadline exists). That removes a protection operators may be relying on, which is a bigger behavior change than the bug fix needs.

Arming lazily on the first poll_next keeps both properties: setup latency before the consumer starts polling is excluded (your complaint), and a stalled-before-first-message handler is still bounded. Concretely: keep per_item: None in new(), and at the top of poll_next (before the timer check) do something like

if this.per_item.as_ref().as_pin_ref().is_none()
    && let Some(d) = this.inter_message
{
    this.per_item.set(Some(tokio::time::sleep(*d)));
}

so the first gap is measured from the first poll, and subsequent gaps from each yielded item, exactly like the existing re-arm.

3. A regression test is required for the change (repo policy — see CONTRIBUTING.md). The existing inter_message_timeout tests only advance time after a first item has been yielded, which is why this wasn't caught. A #[tokio::test(start_paused = true)] along the lines of "construct the stream, advance time past the inter-message timeout before polling / before the inner stream yields, then assert the first real item still comes through" would pin the new behavior; a companion test asserting that a stream which never yields anything still times out would pin the first-poll-arming semantics from point 2.

Happy to re-review once those are in.

@Dev-X25874

Copy link
Copy Markdown
Contributor Author

Thanks for the detailed review. I've addressed all three points:

  1. Dropped the duplicate where clause — it was a rebase artifact.
  2. Changed the lazy arm guard to this.per_item.is_none() so the timer is armed on the first poll_next call, not after the first yielded item — a stalled-before-first-message handler is still caught.
  3. Added two regression tests: setup_latency_before_first_poll_does_not_trigger_timeout and stream_that_never_yields_still_times_out, covering both sides of the new behaviour.

@Dev-X25874 Dev-X25874 force-pushed the fix/inter-message-timer-initial-arm branch from a62fdd6 to f30604d Compare May 27, 2026 02:46
@iainmcgin iainmcgin self-requested a review June 10, 2026 14:51
@iainmcgin

Copy link
Copy Markdown
Collaborator

[claude code] Thanks — the revised change addresses all three points from the earlier review: the broken hunk is gone, the timer arms lazily on the first poll (so setup latency is excluded while a stalled-before-first-message handler is still bounded), and both regression tests pin exactly those behaviors. The logic is good to go.

The remaining CI failures were purely mechanical, so to save a round-trip we pushed a small fixup commit (54fbb90) to your branch — no behavior changes:

  • collapsed the nested ifs in the lazy arm into a let-chain (clippy::collapsible_if; the codebase's MSRV of 1.88 has let-chains)
  • inlined the format arg in the new test's assert message (clippy::uninlined_format_args)
  • cargo +nightly fmt over the new tests

We also verified the branch merged with current main (which has moved since your base: the buffa 0.7 rework in #143 and the deadline changes in #148, among others): the full connectrpc suite passes — 395 tests, clippy -D warnings clean. Note the earlier red run you may have seen was a stale re-run pinned to an old merge snapshot (it was still resolving buffa 0.6); the fresh run triggered by this push tests against today's main.

CI has been approved on the new head; assuming it goes green this is ready to merge.

@iainmcgin iainmcgin added this pull request to the merge queue Jun 10, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 10, 2026
@iainmcgin iainmcgin added this pull request to the merge queue Jun 10, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 10, 2026
@iainmcgin iainmcgin added this pull request to the merge queue Jun 10, 2026
Merged via the queue into anthropics:main with commit b0a709f Jun 10, 2026
13 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 10, 2026
@iainmcgin

Copy link
Copy Markdown
Collaborator

Thanks for the contribution!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants