Add health check retry circuit breaker#16
Conversation
|
Warning Review limit reached
More reviews will be available in 39 minutes and 44 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds retry/backoff and circuit-breaker resiliency to ChangesHealth Check Retry, Backoff & Circuit Breaker
Sequence Diagram(s)sequenceDiagram
participant CLI as main() / CLI
participant RHC as run_health_checks
participant CHS as check_http_service
participant CB as CircuitBreaker
participant PHO as probe_http_once
CLI->>RHC: max_retries, backoff_factor, circuit_breakers, sleep_func
RHC->>CB: init (threshold, cooldown) per service
RHC->>CHS: host, port, path, timeout, retries, backoff, circuit_breaker
CHS->>CB: is_open()
alt open
CB-->>CHS: True → CRITICAL "Circuit open"
else closed
loop attempt 0..max_retries
CHS->>PHO: host, port, path, timeout, connection_factory
PHO-->>CHS: status, detail, code
alt success
CHS->>CB: record_success()
CHS-->>RHC: OK/WARNING result
else failure
CHS->>CB: record_failure()
CHS->>CHS: sleep(backoff delay)
end
end
end
RHC->>RHC: summarize_results(results)
RHC-->>CLI: results with ["summary"]
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tools/health_check.py (1)
105-132:⚠️ Potential issue | 🟠 Major | ⚡ Quick winClose HTTP connections on all probe outcomes
conn.close()only runs on the success path. Ifrequest()/getresponse()/read()raises, the socket can leak across retries/watch loops.Suggested fix
def probe_http_once( @@ ) -> Tuple[str, str, int]: + conn = None try: - conn = connection_factory(host, port, timeout=timeout) + conn = connection_factory(host, port, timeout=timeout) conn.request("GET", path) resp = conn.getresponse() @@ - conn.close() @@ return result, detail, status except Exception as e: return "CRITICAL", str(e), 0 + finally: + if conn is not None: + try: + conn.close() + except Exception: + pass🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/health_check.py` around lines 105 - 132, In the probe_http_once function, the conn.close() call is only executed on the success path, which means if an exception occurs during conn.request(), conn.getresponse(), or resp.read() calls, the socket connection leaks. Refactor the code to use a try-finally block where conn.close() is placed in the finally block to ensure the connection is always closed regardless of whether an exception occurs or the function returns early with a result.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/health_check.py`:
- Around line 173-180: The circuit-breaker failure is being recorded for any
non-OK probe result, including WARNING status responses like 4xx errors, which
can incorrectly trigger the circuit to open. Move the breaker.record_failure()
call to only execute when the last_status is CRITICAL, ensuring that only actual
CRITICAL failures increment the failure count. This prevents WARNING responses
from accumulating failures and incorrectly opening the circuit breaker.
In `@tools/test_health_check_retry_circuit.py`:
- Around line 18-19: The mutable class attributes `outcomes` and `calls` are
missing explicit type annotations which triggers Ruff linting rule RUF012 and
risks test-state leakage. Import `ClassVar` from the typing module at the top of
the file, then annotate both `outcomes = []` and `calls = 0` with explicit
`ClassVar` type hints (e.g., `outcomes: ClassVar[list] = []` and `calls:
ClassVar[int] = 0`) to make the shared state intent explicit and prevent future
test coupling issues.
---
Outside diff comments:
In `@tools/health_check.py`:
- Around line 105-132: In the probe_http_once function, the conn.close() call is
only executed on the success path, which means if an exception occurs during
conn.request(), conn.getresponse(), or resp.read() calls, the socket connection
leaks. Refactor the code to use a try-finally block where conn.close() is placed
in the finally block to ensure the connection is always closed regardless of
whether an exception occurs or the function returns early with a result.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e7a267f4-e4c0-4b71-b9a0-4575a7b63361
📒 Files selected for processing (8)
diagnostic/build-9cd9aec9-part001.logddiagnostic/build-9cd9aec9-part002.logddiagnostic/build-9cd9aec9-part003.logddiagnostic/build-9cd9aec9-part004.logddiagnostic/build-9cd9aec9.jsondocs/OPERATIONS.mdtools/health_check.pytools/test_health_check_retry_circuit.py
|
I addressed the actionable review items in commit
Validation rerun:
|
Summary
Fixes #15.
Adds opt-in retry, exponential backoff, circuit breaker protection, and health-check summary aggregation for HTTP service probes in
tools/health_check.py. The default remains the legacy single-attempt behavior, so existing probe usage is unchanged unless operators pass the new flags.Changes
--max-retries,--backoff-factor,--retry-base-delay,--circuit-threshold, and--circuit-cooldownflags.tools/test_health_check_retry_circuit.pywith deterministic unit tests for retry, backoff, circuit breaker behavior, cooldown reset, and summary counts.docs/OPERATIONS.md.Testing
Ran locally:
Results:
py_compile: passedoverall_status: OKin this local environmentgit diff --check: passedpython3 build.py: 10/10 modules passedDiagnostic artifacts committed in this PR:
diagnostic/build-9cd9aec9.jsondiagnostic/build-9cd9aec9-part001.logddiagnostic/build-9cd9aec9-part002.logddiagnostic/build-9cd9aec9-part003.logddiagnostic/build-9cd9aec9-part004.logdDiagnostic password:
Reassemble command from the build output:
Checklist
Summary by CodeRabbit
New Features
Documentation