Skip to content

Add notification_blocking_timeout for time-bounded notification waits#89

Open
swlynch99 wants to merge 6 commits intomainfrom
claude/add-notification-timeout-t1k3x
Open

Add notification_blocking_timeout for time-bounded notification waits#89
swlynch99 wants to merge 6 commits intomainfrom
claude/add-notification-timeout-t1k3x

Conversation

@swlynch99
Copy link
Contributor

Summary

Adds support for time-bounded notification waits in the Durable SDK. Tasks can now call wait_with_timeout(duration) to block until a notification arrives or a timeout expires, returning Option<Notification>.

Key Changes

  • New WIT interface: Added notification-blocking-timeout function to durable:core/notify interface (version bumped to 2.7.0)
  • Runtime implementation: Implemented timeout-aware notification polling with proper handling of:
    • User-specified timeout expiration
    • Worker suspend timeout (frees worker slot if user timeout is long)
    • Broadcast channel lag recovery (re-polls database on RecvError::Lagged)
    • Concurrent notification delivery during timeout checks
  • SDK bindings: Generated WASM bindings for the new interface function
  • Public API: Added wait_with_timeout() to both durable-core and durable crates
  • Test coverage: Comprehensive test suite including:
    • Basic timeout expiration and notification delivery
    • Deterministic simulation tests (DST) for suspend/wake cycles
    • Broadcast channel lag recovery scenarios
    • Integration tests verifying timely wake-up behavior

Implementation Details

The timeout implementation uses a tokio::select! loop that races three conditions:

  1. Notification arrival via broadcast channel
  2. User-specified timeout expiration
  3. Worker suspend timeout (to free resources during long waits)

When the user timeout expires, the implementation performs one final database poll before returning None, ensuring notifications that arrived concurrently are not missed. If the suspend timeout fires first, the task is suspended to free the worker slot, and the notification is delivered via the normal resume path.

The broadcast channel lag handling ensures robustness: if the channel's buffer overflows, the task recovers by re-polling the database for pending notifications.

https://claude.ai/code/session_01CxLxPCCymXVh9ob6wL5X16

Adds a `notification-blocking-timeout` WIT function and plumbs it through
the runtime as `wait_with_timeout(Duration) -> Option<Notification>`.
When the timeout expires without a notification, returns None instead of
suspending the task. This allows workflows to implement retry/fallback
logic when notifications aren't reliably delivered.

Bumps WIT package version to 2.7.0.

https://claude.ai/code/session_01CxLxPCCymXVh9ob6wL5X16
…eout

The existing test only verifies timeout expiry and pre-queued notifications.
This new test sends a notification while the task is blocked in
wait_with_timeout(120s) and asserts the task completes within 10s,
confirming the broadcast channel wake-up path works correctly.

https://claude.ai/code/session_01CxLxPCCymXVh9ob6wL5X16
The previous implementation held the worker slot for the entire user
timeout without ever suspending. Now the select loop tracks both the
user-provided deadline and the config suspend_timeout. If the suspend
timeout fires first, the task suspends and frees the worker (matching
notification_blocking behavior). If the user timeout fires first, the
function returns None. A notification arriving wakes the task promptly
in either case.

https://claude.ai/code/session_01CxLxPCCymXVh9ob6wL5X16
…eout

Adds notify_wait_timeout_after_suspend which forces suspension via
suspend_timeout(Duration::ZERO), confirms the task enters the suspended
state, then sends a notification and asserts the task completes promptly
(< 15s) rather than sleeping until the 120s user timeout expires.

https://claude.ai/code/session_01CxLxPCCymXVh9ob6wL5X16
Three DST tests using DstScheduler, DstClock, DstEntropy, and
DstEventSource:

- dst_notify_timeout_suspend_then_wake: forces immediate suspension via
  zero suspend_timeout, confirms suspended state, injects notification
  via DB + event source, asserts timely completion.

- dst_notify_timeout_wake_before_suspend: uses long suspend_timeout so
  suspension never occurs, injects notification event directly through
  DstEventSource while task is actively waiting, asserts prompt wake-up.

- dst_notify_timeout_records_scheduler_events: verifies the DstScheduler
  records the expected TaskClaimed (>= 2 for initial + resume) and
  TaskCompleted (exactly 1) events through a full suspend/wake cycle.

https://claude.ai/code/session_01CxLxPCCymXVh9ob6wL5X16
Two new deterministic simulation tests that exercise the
RecvError::Lagged path in notification_blocking_timeout:

- dst_notify_timeout_recovers_from_lag: inserts the real notification
  into the DB first, then floods the broadcast channel with 200 events
  to cause lag. After recovery the task re-polls the DB, finds its
  notification, and completes promptly.

- dst_notify_timeout_recovers_from_lag_then_notified: floods the
  channel to cause lag *without* a pending notification. After the task
  recovers and re-polls (finding nothing), it loops back into the
  select. The real notification is then delivered normally and the task
  completes promptly.

https://claude.ai/code/session_01CxLxPCCymXVh9ob6wL5X16
let notif = result.expect("expected to receive a notification before timeout");
assert_eq!(notif.event, "wakeup");

print!("ok\n");

Check warning

Code scanning / clippy

using print!() with a format string that ends in a single newline Warning

using print!() with a format string that ends in a single newline
let data: String = notif.json().expect("invalid json data");
assert_eq!(data, "hello timeout");

print!("ok\n");

Check warning

Code scanning / clippy

using print!() with a format string that ends in a single newline Warning

using print!() with a format string that ends in a single newline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants