Add retry with backoff for initial snapshot fetch failure#708
Add retry with backoff for initial snapshot fetch failure#708iamnbutler wants to merge 1 commit intomainfrom
Conversation
When the server isn't running at startup, the desktop app now retries
with exponential backoff (1s, 2s, 4s... up to 30s) instead of showing
a permanent empty state. The dashboard displays connection status
("Connecting...", "Reconnecting...") with error details and a manual
Retry button when in a failed/reconnecting state.
Closes #490
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
PR adds exponential backoff for the initial snapshot fetch in the desktop app, with a "Connecting/Reconnecting" status display and a manual Retry button. The change is contained to crates/desktop/ (two files, ~120 lines added), the core logic is correct, and the UX improvements are well-targeted.
No important issues. Three suggestions are left inline.
Build notes:
cargo test --workspace: 1 failing test —server::tests::reject_merge_entry_closed_marks_task_completed— is pre-existing from the previous commit (d6f2fe1,#694), not caused by this PR. The test panics on anunwrap()inserver.rs:3169.cargo clippy --workspace -- -D warnings: 2 pre-existing errors incrates/models/(floor_char_boundaryMSRV andshould_implement_trait), also not caused by this PR.- Desktop crate (
crates/desktop/) is excluded from the workspace test run (GPUI dependencies), so the new retry logic has no automated test coverage — but that's a pre-existing gap in the test setup, not specific to this PR.
Summary of inline comments:
- Backoff redundant with polling loop (
state.rs:411) — The 5s polling loop independently callsupdate_snapshoton failure and spawns its own retry timers, so the rate-limiting intent of the backoff doesn't hold in steady state. Suggestion only: thefetch_in_flightguard caps the actual HTTP rate regardless. - Stale timers survive
retry_connection()(state.rs:346) — Detached timers from old backoff cycles can fire after a manual Retry, briefly flipping status fromConnectingback toReconnecting. A generation counter would prevent this. - Getter in wrong section (
state.rs:337) —snapshot_retry_count()is placed in the// --- Setters ---block; belongs in// --- Getters ---.
Reviewed by PR / Review
| if !had_snapshot { | ||
| self.snapshot_retry_count += 1; | ||
| let delay_ms = INITIAL_RETRY_DELAY_MS | ||
| .saturating_mul(1 << self.snapshot_retry_count.min(5)) |
There was a problem hiding this comment.
[SUGGESTION]
Priority: Code Quality
The backoff timers are redundant with the 5s polling loop during the pre-connection phase. When both are active, each polling tick also calls update_snapshot and, on failure, spawns its own new retry timer. Over time this accumulates multiple concurrent timers all scheduled at the max 30s interval — each one that fires (and fails) schedules yet another 30s timer. The fetch_in_flight guard prevents duplicate HTTP requests, but the effective retry rate in steady state is dominated by the 5s polling, not the 30s backoff cap.
The retry timers do add value for the very first window (2s faster than polling's 5s), but the rate-limiting intent of the backoff doesn't hold once the polling loop joins in.
One cleaner approach: only schedule a retry from update_snapshot when snapshot_retry_count == 1 (i.e., skip the retry timer on polling failures, only use it for the very first failure). Or track a flag like retry_timer_pending to skip scheduling a new one when one is already active.
|
|
||
| /// Manually retry connecting to the server (resets backoff). | ||
| pub fn retry_connection(&mut self, cx: &mut Context<Self>) { | ||
| self.snapshot_retry_count = 0; |
There was a problem hiding this comment.
[SUGGESTION]
Priority: Correctness
retry_connection() resets snapshot_retry_count to 0 and sets status to Connecting, but any previously-detached retry timers remain active. If a stale timer fires after this reset (while the manually-triggered fetch is still in-flight or has already completed), update_snapshot on that path will flip the status back to Reconnecting — causing a visible flicker from Connecting → Reconnecting shortly after the user clicks Retry.
Since GPUI tasks are spawned with .detach(), there's no cancellation handle. A simple mitigation is a generation counter:
retry_generation: u32, // bumped in retry_connection()Pass the current generation into the timer closure, and in the closure, bail out if the generation has changed:
cx.spawn(async move |this, cx| {
smol::Timer::after(delay).await;
this.update(cx, |state, cx| {
if state.retry_generation == generation {
state.refresh_snapshot(cx);
}
}).ok();
}).detach();| } | ||
| } | ||
|
|
||
| /// Get the current snapshot retry count. |
There was a problem hiding this comment.
[SUGGESTION]
Priority: Code Quality
snapshot_retry_count() is a getter but it's placed in the // --- Setters --- section (between set_selected_project and // --- Actions ---). Move it up to join the other getters under // --- Getters ---.
Summary
Test plan
Closes #490
🤖 Generated with Claude Code