-
Notifications
You must be signed in to change notification settings - Fork 4
Add retry/backoff and circuit breaker to health checks #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| { | ||
| "generated_at": "2026-06-20T16:39:09.565245+00:00", | ||
| "commit": "94e0fb08", | ||
| "diagnostic_logd": "diagnostic/build-94e0fb08.logd", | ||
| "diagnostic_logd_error": null, | ||
| "message_blocker": null, | ||
| "chunked": false, | ||
| "chunk_size_bytes": null, | ||
| "password": "efcd3c4a127e4b6e294d", | ||
| "decrypt_command": "encryptly unpack diagnostic/build-94e0fb08.logd <outdir> --password efcd3c4a127e4b6e294d", | ||
| "total_modules": 10, | ||
| "passed": 4, | ||
| "failed": 6, | ||
| "modules": [ | ||
| { | ||
| "name": "backend", | ||
| "status": "FAIL", | ||
| "elapsed_seconds": 0, | ||
| "artifact": null, | ||
| "output": "Command not found: [Errno 2] No such file or directory: 'cargo'" | ||
| }, | ||
| { | ||
| "name": "frontend", | ||
| "status": "PASS", | ||
| "elapsed_seconds": 6.051, | ||
| "artifact": "/Users/zhangsiqiang/Documents/Codex/2026-06-19/new-chat/outputs/thanhle-kickama-15-health-retry/frontend/dist", | ||
| "output": "> tent-frontend@0.0.0 build\n> tsc -b && vite build\n\nvite v6.4.3 building for production...\ntransforming...\n\u2713 100 modules transformed.\nrendering chunks...\ncomputing gzip size...\ndist/index.html 0.62 kB \u2502 gzip: 0.34 kB\ndist/assets/state-BkjSKDbY.js 8.91 kB \u2502 gzip: 3.55 kB \u2502 map: 57.15 kB\ndist/assets/vendor-CREcWLHI.js 48.93 kB \u2502 gzip: 17.22 kB \u2502 map: 481.27 kB\ndist/assets/index-CyxcoTyU.js 231.32 kB \u2502 gzip: 72.02 kB \u2502 map: 1,044.42 kB\n\u2713 built in 645ms" | ||
|
Comment on lines
+26
to
+27
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: cd /tmp/repository 2>/dev/null || cd . && find . -name "diagnostic" -type d | head -5Repository: thanhle74/kickama Length of output: 74 🏁 Script executed: find . -name "build-94e0fb08.json" -type fRepository: thanhle74/kickama Length of output: 94 🏁 Script executed: cat -n diagnostic/build-94e0fb08.json 2>/dev/null | head -60Repository: thanhle74/kickama Length of output: 4600 🏁 Script executed: grep -r "artifact" --include="*.py" --include="*.js" --include="*.ts" --include="*.json" --include="*.sh" . 2>/dev/null | grep -v "\.git" | grep -v "node_modules" | head -20Repository: thanhle74/kickama Length of output: 1884 🏁 Script executed: grep -r "build-94e0fb08" . 2>/dev/null | grep -v "\.git"Repository: thanhle74/kickama Length of output: 601 🏁 Script executed: ls -la diagnostic/ | head -20Repository: thanhle74/kickama Length of output: 482 🏁 Script executed: grep -B 5 -A 5 '"artifact": binary' build.pyRepository: thanhle74/kickama Length of output: 419 🏁 Script executed: grep -B 10 -A 10 'def.*artifact' build.py | head -50Repository: thanhle74/kickama Length of output: 847 🏁 Script executed: cat build.py | head -100Repository: thanhle74/kickama Length of output: 3042 🏁 Script executed: grep -B 20 "artifact.*binary" build.py | head -60Repository: thanhle74/kickama Length of output: 2406 🏁 Script executed: grep -A 5 "def build_module" build.pyRepository: thanhle74/kickama Length of output: 184 🏁 Script executed: grep -B 5 -A 15 "for name, success, elapsed, output, binary in results" build.pyRepository: thanhle74/kickama Length of output: 2755 🏁 Script executed: grep -B 30 "def build_module" build.py | tail -40Repository: thanhle74/kickama Length of output: 777 🏁 Script executed: grep -A 50 "def build_module" build.py | head -60Repository: thanhle74/kickama Length of output: 1925 🏁 Script executed: rg "binary" build.py -B 2 -A 2Repository: thanhle74/kickama Length of output: 2752 🏁 Script executed: grep -A 15 "def verify_binary" build.pyRepository: thanhle74/kickama Length of output: 540 Convert The absolute paths (e.g., 🤖 Prompt for AI Agents |
||
| }, | ||
| { | ||
| "name": "market", | ||
| "status": "FAIL", | ||
| "elapsed_seconds": 0, | ||
| "artifact": null, | ||
| "output": "Command not found: [Errno 2] No such file or directory: 'go'" | ||
| }, | ||
| { | ||
| "name": "frailbox", | ||
| "status": "FAIL", | ||
| "elapsed_seconds": 0.471, | ||
| "artifact": null, | ||
| "output": "gcc -Wall -Wextra -Wpedantic -std=c2x -O2 -g -D_FORTIFY_SOURCE=3 -fstack-protector-strong -fPIE -Iinclude -MMD -MP -c src/arena.c -o build/src/arena.o\nsrc/arena.c:17:23: error: use of undeclared identifier 'MAP_HUGETLB'\n 17 | mmap_flags |= MAP_HUGETLB;\n | ^~~~~~~~~~~\nsrc/arena.c:179:17: warning: comparison of distinct pointer types ('const void *' and 'char *') [-Wcompare-distinct-pointer-types]\n 179 | ptr < (char *)region->start + region->size) {\n | ~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n1 warning and 1 error generated.\nmake: *** [build/src/arena.o] Error 1" | ||
| }, | ||
| { | ||
| "name": "engine", | ||
| "status": "PASS", | ||
| "elapsed_seconds": 12.815, | ||
| "artifact": "/Users/zhangsiqiang/Documents/Codex/2026-06-19/new-chat/outputs/thanhle-kickama-15-health-retry/frailbox/engine/build/trial-engine", | ||
| "output": "[ 11%] Building CXX object CMakeFiles/trial-engine.dir/main.cpp.o\n[ 22%] Building CXX object CMakeFiles/trial-engine.dir/core/math.cpp.o\n[ 33%] Building CXX object CMakeFiles/trial-engine.dir/core/ecs.cpp.o\n[ 44%] Building CXX object CMakeFiles/trial-engine.dir/dynamics/rigidbody.cpp.o\n[ 55%] Building CXX object CMakeFiles/trial-engine.dir/dynamics/constraint.cpp.o\n[ 66%] Building CXX object CMakeFiles/trial-engine.dir/collision/collision.cpp.o\n[ 77%] Building CXX object CMakeFiles/trial-engine.dir/Users/zhangsiqiang/Documents/Codex/2026-06-19/new-chat/outputs/thanhle-kickama-15-health-retry/frailbox/wat.cpp.o\n[ 88%] Building CXX object CMakeFiles/trial-engine.dir/Users/zhangsiqiang/Documents/Codex/2026-06-19/new-chat/outputs/thanhle-kickama-15-health-retry/frailbox/engine.cpp.o\n[100%] Linking CXX executable trial-engine\n[100%] Built target trial-engine" | ||
| }, | ||
| { | ||
| "name": "compliance", | ||
| "status": "PASS", | ||
| "elapsed_seconds": 2.613, | ||
| "artifact": "/Users/zhangsiqiang/Documents/Codex/2026-06-19/new-chat/outputs/thanhle-kickama-15-health-retry/compliance/build", | ||
| "output": "\u6ce8: ComplianceAuditor.java\u4f7f\u7528\u6216\u8986\u76d6\u4e86\u5df2\u8fc7\u65f6\u7684 API\u3002\n\u6ce8: \u6709\u5173\u8be6\u7ec6\u4fe1\u606f, \u8bf7\u4f7f\u7528 -Xlint:deprecation \u91cd\u65b0\u7f16\u8bd1\u3002" | ||
| }, | ||
| { | ||
| "name": "v2-market-stream", | ||
| "status": "PASS", | ||
| "elapsed_seconds": 0.16, | ||
| "artifact": null, | ||
| "output": "Syntax OK" | ||
| }, | ||
| { | ||
| "name": "nfc-scanner", | ||
| "status": "FAIL", | ||
| "elapsed_seconds": 0, | ||
| "artifact": null, | ||
| "output": "Command not found: [Errno 2] No such file or directory: 'luac'" | ||
| }, | ||
| { | ||
| "name": "openapi-haskell", | ||
| "status": "FAIL", | ||
| "elapsed_seconds": 0, | ||
| "artifact": null, | ||
| "output": "Command not found: [Errno 2] No such file or directory: 'ghc'" | ||
| }, | ||
| { | ||
| "name": "openapi-tools", | ||
| "status": "FAIL", | ||
| "elapsed_seconds": 0, | ||
| "artifact": null, | ||
| "output": "Command not found: [Errno 2] No such file or directory: 'luac'" | ||
| } | ||
| ], | ||
| "pr_note": "Include the encrypted diagnostic logd artifact(s): diagnostic/build-94e0fb08.logd. The encrypted .logd is the required diagnostic content for PR review; this JSON file is metadata. Maintainers may ask you to remove these diagnostic artifacts before merging." | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,124 @@ | ||
| import sys | ||
| import unittest | ||
| from pathlib import Path | ||
| from typing import List | ||
|
|
||
|
|
||
| ROOT = Path(__file__).resolve().parents[1] | ||
| sys.path.insert(0, str(ROOT)) | ||
|
|
||
| from tools import health_check # noqa: E402 | ||
|
|
||
|
|
||
| class HealthCheckRetryCircuitTest(unittest.TestCase): | ||
| def test_http_probe_retries_until_success(self) -> None: | ||
|
Comment on lines
+13
to
+14
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isolate tests from global circuit-breaker state.
Proposed fix class HealthCheckRetryCircuitTest(unittest.TestCase):
+ def setUp(self) -> None:
+ health_check.HTTP_CIRCUIT_BREAKERS.clear()
+
def test_http_probe_retries_until_success(self) -> None:🤖 Prompt for AI Agents |
||
| attempts = 0 | ||
|
|
||
| def request_once(host: str, port: int, path: str, timeout: int) -> tuple[int, str]: | ||
| nonlocal attempts | ||
| attempts += 1 | ||
| if attempts == 1: | ||
| raise ConnectionRefusedError("not ready") | ||
| return 200, "ok" | ||
|
|
||
| status, detail, code = health_check.check_http_service( | ||
| "localhost", | ||
| 8080, | ||
| "/health", | ||
| 5, | ||
| max_retries=2, | ||
| backoff_factor=0, | ||
| request_once=request_once, | ||
| sleep=lambda seconds: None, | ||
| ) | ||
|
|
||
| self.assertEqual(status, "OK") | ||
| self.assertEqual(code, 200) | ||
| self.assertEqual(attempts, 2) | ||
| self.assertIn("after 2 attempts", detail) | ||
|
|
||
| def test_retry_backoff_uses_exponential_schedule(self) -> None: | ||
| sleeps: List[float] = [] | ||
|
|
||
| def request_once(host: str, port: int, path: str, timeout: int) -> tuple[int, str]: | ||
| raise TimeoutError("timeout") | ||
|
|
||
| status, detail, code = health_check.check_http_service( | ||
| "localhost", | ||
| 8080, | ||
| "/health", | ||
| 5, | ||
| max_retries=2, | ||
| backoff_factor=2, | ||
| backoff_base_delay=0.5, | ||
| request_once=request_once, | ||
| sleep=sleeps.append, | ||
| ) | ||
|
|
||
| self.assertEqual(status, "CRITICAL") | ||
| self.assertEqual(code, 0) | ||
| self.assertIn("timeout", detail) | ||
| self.assertEqual(sleeps, [0.5, 1.0]) | ||
|
|
||
| def test_circuit_breaker_opens_after_threshold(self) -> None: | ||
| breaker = health_check.CircuitBreaker(threshold=2, cooldown_seconds=30, clock=lambda: 100.0) | ||
|
|
||
| breaker.record_failure() | ||
| self.assertTrue(breaker.allow_request()) | ||
|
|
||
| breaker.record_failure() | ||
| self.assertFalse(breaker.allow_request()) | ||
|
|
||
| def test_circuit_breaker_resets_after_cooldown(self) -> None: | ||
| now = 100.0 | ||
|
|
||
| def clock() -> float: | ||
| return now | ||
|
|
||
| breaker = health_check.CircuitBreaker(threshold=1, cooldown_seconds=30, clock=clock) | ||
| breaker.record_failure() | ||
| self.assertFalse(breaker.allow_request()) | ||
|
|
||
| now = 131.0 | ||
| self.assertTrue(breaker.allow_request()) | ||
|
|
||
| def test_health_summary_counts_statuses_and_circuit_breakers(self) -> None: | ||
| results = { | ||
| "services": { | ||
| "backend": {"status": "OK"}, | ||
| "market": {"status": "WARNING", "circuit": "closed"}, | ||
| "frailbox": {"status": "CRITICAL", "circuit": "open"}, | ||
| }, | ||
| "infrastructure": {"redis": {"status": "CRITICAL"}}, | ||
| "system": {"disk": {"status": "OK"}, "memory": {"status": "WARNING"}}, | ||
| } | ||
|
|
||
| self.assertEqual( | ||
| health_check.summarize_results(results), | ||
| { | ||
| "ok": 2, | ||
| "warning": 2, | ||
| "critical": 2, | ||
| "circuit_open": 1, | ||
| }, | ||
| ) | ||
|
|
||
| def test_retry_flags_are_parsed(self) -> None: | ||
| args = health_check.parse_args( | ||
| [ | ||
| "--max-retries", | ||
| "3", | ||
| "--backoff-factor", | ||
| "1.5", | ||
| "--circuit-threshold", | ||
| "4", | ||
| ] | ||
| ) | ||
|
|
||
| self.assertEqual(args.max_retries, 3) | ||
| self.assertEqual(args.backoff_factor, 1.5) | ||
| self.assertEqual(args.circuit_threshold, 4) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| unittest.main() | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: thanhle74/kickama
Length of output: 267
🏁 Script executed:
Repository: thanhle74/kickama
Length of output: 482
🏁 Script executed:
Repository: thanhle74/kickama
Length of output: 185
🏁 Script executed:
# Read the diagnostic JSON file to verify the password presence cat diagnostic/build-94e0fb08.jsonRepository: thanhle74/kickama
Length of output: 5120
🏁 Script executed:
Repository: thanhle74/kickama
Length of output: 3742
🏁 Script executed:
# Check the CI workflow configuration cat .github/workflows/diagnostic-build-log.ymlRepository: thanhle74/kickama
Length of output: 10279
🏁 Script executed:
Repository: thanhle74/kickama
Length of output: 6347
🏁 Script executed:
Repository: thanhle74/kickama
Length of output: 1438
🏁 Script executed:
Repository: thanhle74/kickama
Length of output: 212
Remove plaintext password from committed diagnostic metadata before merging.
The password and decrypt_command embed a bearer secret in the PR that will be visible to all reviewers and persist in git history until artifacts are removed. Although the PR note correctly states these diagnostic artifacts are temporary, the password is required by the CI validation workflow and cannot be omitted from the JSON without redesigning the CI contract to pass secrets differently (e.g., via GitHub encrypted secrets or environment variables). Verify with maintainers whether the CI workflow can be modified to handle decryption credentials outside of committed metadata, or ensure these artifacts are removed immediately after merge.
Additionally, artifact paths expose the builder's absolute filesystem layout (e.g.,
/Users/zhangsiqiang/Documents/Codex/...). Consider using relative paths or truncating paths to remove usernames and workspace details.🧰 Tools
🪛 Betterleaks (1.5.0)
[high] 9-9: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🤖 Prompt for AI Agents
Source: Linters/SAST tools