Skip to content

feat(plugins/servo): add poisoning mechanism for repeated render panics #348

@staging-devin-ai-integration

Description

Summary

Add a poisoning mechanism to the Servo plugin's shared renderer thread so that instances which panic repeatedly are automatically disabled rather than retried indefinitely. Currently, catch_unwind recovers from panics and returns the last good frame, but a pathological page (e.g. one that reliably triggers a Servo bug) will panic on every tick() call, flooding logs with backtraces.

Related PRs: #326 (hardening — added catch_unwind recovery)
Parent issue: #167

Motivation

From PR #326 review (Devin Review Comment 10): the current hardening wraps each work item in catch_unwind(AssertUnwindSafe(...)), which is good for resilience but has no circuit-breaker. A node hitting a deterministic Servo bug will:

  1. Panic every render tick (~30 times/sec)
  2. Log a full backtrace each time
  3. Always return the stale cached frame (or blank if no frame was ever cached)
  4. Never recover (since the same URL/state causes the same panic)

Proposed Design

Per-instance panic counter

struct InstanceState {
    // ... existing fields ...
    /// Consecutive panic count (reset on successful render).
    consecutive_panics: u32,
    /// If true, this instance is poisoned and all renders return the
    /// cached frame without attempting Servo calls.
    poisoned: bool,
}

Poisoning threshold

After N consecutive panics (suggested: const POISON_THRESHOLD: u32 = 5), mark the instance as poisoned:

if state.consecutive_panics >= POISON_THRESHOLD {
    state.poisoned = true;
    tracing::error!(
        node_id = %node_id,
        panics = state.consecutive_panics,
        "Instance poisoned after {POISON_THRESHOLD} consecutive panics — \
         renders will return cached frame until URL is changed"
    );
}

Recovery path

A UpdateConfig with a new URL should reset the poison state, since the new page may not trigger the same bug:

fn handle_update_config(...) {
    // ... navigate to new URL ...
    state.consecutive_panics = 0;
    state.poisoned = false;
}

Render behavior when poisoned

fn handle_render(...) {
    if state.poisoned {
        // Return cached frame without touching Servo
        let frame = state.last_good_frame.clone().unwrap_or_default();
        let _ = state.result_tx.send(ServoThreadResult::Frame(frame));
        return;
    }
    // ... normal render path with catch_unwind ...
}

Acceptance Criteria

  • Per-instance consecutive panic counter, reset on successful render
  • Configurable poison threshold (const or config param)
  • Poisoned instances skip Servo calls entirely and return cached frame
  • URL change (UpdateConfig) resets poison state
  • tracing::error! emitted when an instance becomes poisoned
  • tracing::info! emitted when poison state is reset
  • Unit test: verify counter increments and resets correctly
  • Unit test: verify poisoned state skips render

Complexity

Effort: S (half day)
Straightforward state machine addition to existing InstanceState.

Notes

  • The AssertUnwindSafe wrapper is a known risk (Devin Review Comment 8) — poisoning mitigates the worst case (infinite panic loops) but does not solve potential state corruption from unwinding through !UnwindSafe types. A deeper fix would be to restart the WebView on panic, but that is a separate concern.
  • Consider also emitting a metric (servo.instance.poisoned counter) for OTel dashboards.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions