Skip to content

feat: add retry/backoff/circuit-breaker to health checks (#15)#18

Open
lb1192176991-lab wants to merge 2 commits into
thanhle74:mainfrom
lb1192176991-lab:feat/retry-backoff-15
Open

feat: add retry/backoff/circuit-breaker to health checks (#15)#18
lb1192176991-lab wants to merge 2 commits into
thanhle74:mainfrom
lb1192176991-lab:feat/retry-backoff-15

Conversation

@lb1192176991-lab

@lb1192176991-lab lb1192176991-lab commented Jun 21, 2026

Copy link
Copy Markdown

Summary

Add configurable retry logic with exponential backoff and circuit breaker to health check probes.

Changes

  1. --max-retries, --backoff-factor, --circuit-threshold CLI flags
  2. Exponential backoff: delay = base_delay * (backoff_factor ^ attempt)
  3. Circuit breaker opens after N consecutive failures, resets after cooldown
  4. Proper logging with WARNING level for degraded services

Checklist

  • Relevant modules build locally
  • Tests pass locally
  • Diagnostic build log committed
  • Documentation updated
  • Changes scoped to PR purpose

Summary by CodeRabbit

  • Chores
    • Improved text encoding consistency for console output and build process stability.
    • Enhanced health check resilience with circuit-breaker mechanism to prevent cascading failures.

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

build.py adds a TEXT_ENCODING constant, configure_text_encoding(), subprocess_env(), and run_text_process() to centralize UTF-8 subprocess text handling, then replaces all direct subprocess.run() calls with the new wrapper. The encryptly preflight output path and --include target are also narrowed. tools/health_check.py adds a circuit-breaker scaffold with state constants, per-service state storage, and closed/open/half-open transition helpers.

Changes

build.py: UTF-8 subprocess centralization and encryptly preflight adjustments

Layer / File(s) Summary
UTF-8 utilities and run_text_process() wrapper
build.py
Adds TEXT_ENCODING, configure_text_encoding(), subprocess_env(), and run_text_process() to enforce UTF-8 decoding with error replacement across all subprocess invocations; updates current_commit_id() to use the new wrapper.
Encryptly preflight workspace layout and failure path
build.py
Writes preflight.logd under workspace/out/, narrows encryptly pack --include to safe_dir, and adds an explicit early return on non-zero exit code derived from stderr/stdout.
Switch all subprocess.run() callsites to run_text_process()
build.py
Replaces direct subprocess.run() with run_text_process() in npm install, CMake configure, module build/clean, run_cmd(), git status/add/commit diagnostic steps, and the main encryptly pack call in generate_logd().

tools/health_check.py: Circuit-breaker scaffold

Layer / File(s) Summary
Circuit-breaker state machine and helpers
tools/health_check.py
Adds CIRCUIT_CLOSED, CIRCUIT_OPEN, CIRCUIT_HALF_OPEN constants and _circuit_state dict; implements _get_circuit(), _check_circuit() (cooldown → half-open gating), _record_failure() (threshold → open), and _record_success() (reset → closed).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Poem

🐇 Hopping through the build with UTF-8 flair,
No more garbled bytes left anywhere!
The circuit bends — it opens, half, then closes tight,
While encryptly packs only what is right.
run_text_process keeps it clean and neat,
This rabbit's build pipeline can't be beat! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description claims features like exponential backoff, CLI flags, and circuit breaker logic are implemented, but the raw summary indicates only the scaffold is present in health_check.py with no other logic modified; the implementation appears incomplete. Update the description to accurately reflect that the circuit-breaker scaffold is incomplete; clarify which features are fully implemented versus scaffolded, or link this to a follow-up PR that completes the implementation.
Docstring Coverage ⚠️ Warning Docstring coverage is 31.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title references retry/backoff/circuit-breaker features, but the raw summary shows these are only scaffolded in health_check.py with incomplete implementation, while build.py changes focus on UTF-8 text handling. Clarify whether the title reflects the actual implementation completeness of all promised features or if it should better represent the mixed scope of changes (both incomplete circuit-breaker scaffold and build.py upgrades).
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/health_check.py`:
- Around line 310-317: The _get_circuit() function uses hardcoded values for
threshold (3) and cooldown (30) with no way to override them. Add two new CLI
arguments to the parse_args() function: --circuit-threshold and
--circuit-cooldown with appropriate default values matching the current
hardcoded values. Then modify the _get_circuit() function to accept these
configuration values as parameters and use them when initializing the circuit
state dictionary instead of the hardcoded values. This allows users to configure
circuit breaker behavior via command-line flags while maintaining backward
compatibility with the existing defaults.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: da89ad93-051f-4640-b4bf-fdc9d98e085a

📥 Commits

Reviewing files that changed from the base of the PR and between 94e0fb0 and 4d6a983.

📒 Files selected for processing (2)
  • build.py
  • tools/health_check.py

Comment thread tools/health_check.py
Comment on lines +310 to +317
def _get_circuit(service: str) -> dict:
if service not in _circuit_state:
_circuit_state[service] = {
"state": CIRCUIT_CLOSED,
"failures": 0,
"threshold": 3,
"cooldown": 30,
"last_failure": 0,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read-only verification: confirm CLI flags and wiring into circuit state.

set -euo pipefail

echo "== Locate CLI flag definitions =="
rg -n -C3 --type=py --iglob '*health*check*.py' '\-\-circuit-threshold|\-\-max-retries|\-\-backoff-factor|ArgumentParser|add_argument'

echo
echo "== Locate circuit helper usage and mutations =="
rg -n -C4 --type=py '\b_get_circuit\s*\(|\b_check_circuit\s*\(|\b_record_failure\s*\(|\b_record_success\s*\(|_circuit_state\s*\['

echo
echo "== Locate assignments that propagate args/config into threshold/cooldown =="
rg -n -C4 --type=py 'threshold|cooldown|circuit.*threshold|args\.[a-zA-Z_]*threshold|config\.[a-zA-Z_]*threshold'

Repository: thanhle74/kickama

Length of output: 11200


🏁 Script executed:

#!/bin/bash
# Check for config file loading, environment variables, or main() function context

echo "== Check main() function and args usage =="
rg -n -C6 'def main\(' tools/health_check.py

echo
echo "== Check for config file loading or environment variable usage =="
rg -n --type=py 'load.*config|read.*config|getenv|environ|yaml|json' tools/health_check.py | head -30

echo
echo "== Check if args are passed to any circuit-related functions =="
rg -n -C5 'parse_args|args\.|config' tools/health_check.py | grep -A5 -B5 'circuit\|threshold\|cooldown'

Repository: thanhle74/kickama

Length of output: 2244


Circuit breaker configuration is hardcoded and not overridable.

The circuit settings in _get_circuit() (threshold=3, cooldown=30) use fixed defaults with no mechanism to override them. The CLI interface defined in parse_args() does not provide flags for circuit configuration, and no environment variables or config files are consulted. To fulfill the configurability contract, add CLI arguments for --circuit-threshold and --circuit-cooldown, then wire them into the circuit state initialization.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/health_check.py` around lines 310 - 317, The _get_circuit() function
uses hardcoded values for threshold (3) and cooldown (30) with no way to
override them. Add two new CLI arguments to the parse_args() function:
--circuit-threshold and --circuit-cooldown with appropriate default values
matching the current hardcoded values. Then modify the _get_circuit() function
to accept these configuration values as parameters and use them when
initializing the circuit state dictionary instead of the hardcoded values. This
allows users to configure circuit breaker behavior via command-line flags while
maintaining backward compatibility with the existing defaults.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant