Flaky CI Forensics

Stop rerunning CI until somebody knows why it failed.

Part of the Practical Agent Skills collection.

flaky-ci-forensics helps engineering teams turn intermittent CI/test failures into a concrete triage decision. It combines a domain workflow, failure taxonomy, local parser, and report template so the result is more stable than asking a model to "debug this flaky test" from raw logs.

What It Solves

Teams often waste CI minutes and developer attention rerunning failures that later pass, while real regressions can also be mislabeled as flakes. This skill forces the agent to preserve evidence, classify the failure mode, estimate cost, and recommend a bounded next action.

Why Use This Instead Of A Prompt

Parses JUnit XML, CI logs, and optional history CSV locally.
Separates timeouts, selector sync, external-service failures, state leaks, runner issues, and true regressions.
Estimates wasted CI minutes/day when cost inputs exist.
Prevents the agent from hiding real product regressions behind "probably flaky" language.

Included Assets

SKILL.md: trigger conditions, workflow, output format, and safety guardrails.
agents/openai.yaml: Codex/OpenAI-style metadata.
references/flaky-test-rules.md: failure taxonomy, decision rules, and anti-patterns.
templates/triage-report.md: report skeleton.
scripts/flaky_ci_forensics.py: local JUnit/log/history analyzer.
scripts/fixtures/: smoke-test JUnit, CI log, and history CSV.
.claude/skills/flaky-ci-forensics/SKILL.md: Claude Code mirror.
openclaw/README.md and hermes/README.md: runtime installation notes.

Quick Start

python3 scripts/flaky_ci_forensics.py \
  --junit scripts/fixtures/junit.xml \
  --log scripts/fixtures/ci.log \
  --history scripts/fixtures/history.csv \
  --avg-job-minutes 14 \
  --runs-per-day 60

Inputs

JUnit XML from CI or local test runs.
CI logs with failure excerpts, retry information, runner metadata, or browser logs.
Optional history CSV with test-level run/failure/rerun-pass counts.
Optional cost values for average job minutes and runs per day.

Output

The script prints a Markdown report with:

CI decision.
Failure clusters.
Cost and frequency estimate.
Root-cause hypotheses.
Minimal fix plan.
Instrumentation and guardrails.

Runtime Status

Runtime	Status
Codex/OpenAI-style	Supported with `SKILL.md` and `agents/openai.yaml`.
Claude Code	Supported through `.claude/skills/flaky-ci-forensics/SKILL.md` mirror or by copying this directory.
OpenClaw	CLI present, but this local skill is not installed or published to ClawHub, so runtime visibility is not verified.
Hermes	CLI present, but `hermes skills inspect` does not accept this local directory; install requires a supported registry identifier or direct URL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flaky CI Forensics

What It Solves

Why Use This Instead Of A Prompt

Included Assets

Quick Start

Inputs

Output

Runtime Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.claude/skills/flaky-ci-forensics		.claude/skills/flaky-ci-forensics
agents		agents
hermes		hermes
openclaw		openclaw
references		references
scripts		scripts
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SKILL.md		SKILL.md

Folders and files

Latest commit

History

Repository files navigation

Flaky CI Forensics

What It Solves

Why Use This Instead Of A Prompt

Included Assets

Quick Start

Inputs

Output

Runtime Status

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages