Stop rerunning CI until somebody knows why it failed.
Part of the Practical Agent Skills collection.
flaky-ci-forensics helps engineering teams turn intermittent CI/test failures into a concrete triage decision. It combines a domain workflow, failure taxonomy, local parser, and report template so the result is more stable than asking a model to "debug this flaky test" from raw logs.
Teams often waste CI minutes and developer attention rerunning failures that later pass, while real regressions can also be mislabeled as flakes. This skill forces the agent to preserve evidence, classify the failure mode, estimate cost, and recommend a bounded next action.
- Parses JUnit XML, CI logs, and optional history CSV locally.
- Separates timeouts, selector sync, external-service failures, state leaks, runner issues, and true regressions.
- Estimates wasted CI minutes/day when cost inputs exist.
- Prevents the agent from hiding real product regressions behind "probably flaky" language.
SKILL.md: trigger conditions, workflow, output format, and safety guardrails.agents/openai.yaml: Codex/OpenAI-style metadata.references/flaky-test-rules.md: failure taxonomy, decision rules, and anti-patterns.templates/triage-report.md: report skeleton.scripts/flaky_ci_forensics.py: local JUnit/log/history analyzer.scripts/fixtures/: smoke-test JUnit, CI log, and history CSV..claude/skills/flaky-ci-forensics/SKILL.md: Claude Code mirror.openclaw/README.mdandhermes/README.md: runtime installation notes.
python3 scripts/flaky_ci_forensics.py \
--junit scripts/fixtures/junit.xml \
--log scripts/fixtures/ci.log \
--history scripts/fixtures/history.csv \
--avg-job-minutes 14 \
--runs-per-day 60- JUnit XML from CI or local test runs.
- CI logs with failure excerpts, retry information, runner metadata, or browser logs.
- Optional history CSV with test-level run/failure/rerun-pass counts.
- Optional cost values for average job minutes and runs per day.
The script prints a Markdown report with:
- CI decision.
- Failure clusters.
- Cost and frequency estimate.
- Root-cause hypotheses.
- Minimal fix plan.
- Instrumentation and guardrails.
| Runtime | Status |
|---|---|
| Codex/OpenAI-style | Supported with SKILL.md and agents/openai.yaml. |
| Claude Code | Supported through .claude/skills/flaky-ci-forensics/SKILL.md mirror or by copying this directory. |
| OpenClaw | CLI present, but this local skill is not installed or published to ClawHub, so runtime visibility is not verified. |
| Hermes | CLI present, but hermes skills inspect does not accept this local directory; install requires a supported registry identifier or direct URL. |