Explain Compiler — parse Clang/LLVM .opt.yaml optimization remark streams, normalize them into a stable schema, and drive summary, stats, diff, export, check, explain, Chapter 11-style training exports, and Chapter 12-style CI reports (Markdown, JSON, HTML, PR comments, policy gates), plus digest and doctor for cache keys and masked config (Chapter 13 themes).
Companion tooling for Decode the Compiler: AI-Guided Explanations of C/C++ Optimization Logs for Real-World Performance.
The compiler already decided what to optimize, what to skip, and often why. Those decisions are recorded as YAML streams with tags such as !Missed, !Passed, and !Analysis. Treating that output as data — rather than scrolling thousands of lines by hand — is what makes performance work reproducible and teachable.
Clang can emit a machine-oriented record of optimization events tied to source locations. explncc:
- parses YAML document streams (not a single mapping),
- preserves remark kind from YAML tags,
- normalizes inconsistent
Argsintomessage,cost,threshold, and related fields without inventing data, - supports directory inputs (all
*.opt.yamlrecursively).
python3.12 -m venv .venv
source .venv/bin/activate
make install-devmake examples
python -m explncc summary build/examples/ --limit 20
python -m explncc stats build/examples/vectorize_aliasing_fail/ --json
python -m explncc diff \
build/examples/inline_too_costly/before/before.opt.yaml \
build/examples/inline_too_costly/after/after.opt.yaml
python -m explncc explain build/examples/inline_miss_no_definition/main.opt.yaml --backend rule
python -m explncc export build/examples/ --format jsonl -o /tmp/out.jsonl
python -m explncc check build/examples/ --max-missed-inline 200These commands are deterministic: they do not train or call a model unless you plug the output into your own tooling.
# Heuristic slice: vectorization-related remarks (pass names, keywords, vector width field)
python -m explncc alignment build/examples/vectorize_success/ --limit 20
python -m explncc alignment build/examples/ --json | head -c 600
# JSONL for fine-tuning / instruction tuning (OpenAI-style chat messages + optional metadata)
python -m explncc dataset build/examples/vectorize_aliasing_fail/ \
-o /tmp/ch11_train.jsonl \
--focus alignment \
--template guided \
--format explncc-record
# Same remarks × multiple prompt shapes (for benchmark sweeps)
python -m explncc bench-prompts build/examples/vectorize_success/vectorize_success.opt.yaml \
--focus alignment \
--templates minimal,guided,rubric \
-o /tmp/ch11_bench.jsonlSee docs/chapter-11-notes.md for how this maps to the chapter outline and where IR must be joined in separately.
explncc report turns the same normalized remarks into one artifact for pipelines: no separate “CI edition” of the parser.
# GitHub Actions job summary (also: scripts/ci_github_step_summary.sh)
python -m explncc report build/app.opt.yaml --format markdown --no-explain --title "Build remarks" >> "$GITHUB_STEP_SUMMARY"
# Collapsible Markdown for pull-request bots (`gh pr comment --body-file`, etc.)
python -m explncc report build/app.opt.yaml --format github --no-explain -o pr-comment.md
# Machine-readable bundle for dashboards or custom gates
python -m explncc report build/app.opt.yaml --format json --no-explain -o report.json
# Self-contained HTML (browser / attachment friendly)
python -m explncc report build/app.opt.yaml --format html --no-explain -o report.html
# Same thresholds as `check`: exit 1 when limits are exceeded (after writing `-o`)
python -m explncc report build/app.opt.yaml -o triage.md --fail-on-check --max-missed-inline 80
# Optional model layer (use sparingly in CI: cost, latency, secrets)
python -m explncc report build/app.opt.yaml --format markdown --explain-backend rule
# Stable digests over collected .opt.yaml (CI cache keys) and masked backend env
python -m explncc digest build/
python -m explncc doctorCopy-ready samples live under examples/ci/. Author notes: docs/chapter-12-notes.md, docs/chapter-13-notes.md.
explncc viz emits Mermaid diagrams, HTML with Mermaid.js, or JSON for your own graph UI — all from the same normalized remarks as the rest of the tool (not from LLVM IR bitcode).
python -m explncc viz build/examples/ --style pass-summary --format mermaid --top 12 -o remarks.mmd
python -m explncc viz build/app.opt.yaml --style pass-remark --format json -o viz.json
python -m explncc viz build/app.opt.yaml --style missed-top --format html --explain-backend rule -o viz.htmlAuthor notes: docs/chapter-14-notes.md. Demo: make chapter14-demo.
Rich tables list kind, pass, remark, function, location, and a truncated message. Use --json or --jsonl for stable downstream tooling.
| Module | Role |
|---|---|
explncc/parser.py |
YAML stream loader with !Missed / !Passed / !Analysis |
explncc/normalizer.py |
Raw document → OptimizationRecord |
explncc/models.py |
Pydantic schema |
explncc/summary.py / stats.py |
Filtering and aggregates |
explncc/diffing.py |
Build-vs-build missed deltas and counters |
explncc/exporters.py |
json, jsonl, csv |
explncc/checks.py |
CI thresholds |
explncc/explain/ |
Rule text + optional HTTP backends |
explncc/alignment.py |
Heuristic SIMD / alignment-related remark slice |
explncc/prompt_templates.py |
Named Chapter 11 user prompts (minimal, guided, rubric) |
explncc/dataset_llm.py |
JSONL builders for training / bench rows |
explncc/ci_report.py |
Markdown / JSON / HTML / GitHub-flavored CI reports |
explncc/digest.py |
Per-file and aggregate SHA-256 over .opt.yaml inputs |
explncc/config.py |
Backend env + doctor payload |
explncc/viz.py |
Mermaid / HTML / JSON visualization bundles (viz command) |
explncc/cli.py |
Typer commands |
Subpackages stay small so a book chapter can point to one file at a time.
- Clang/LLVM
-fsave-optimization-record/-foptimization-record-file=…output (.opt.yaml) - One file or a directory tree; only
*.opt.yamlfiles are read
- Heuristics depend on Clang’s YAML shape; newer LLVM versions may add fields (handled conservatively).
alignmentslice is keyword/pass-based, not semantic analysis; validate on your corpus before publishing benchmark numbers.- Diff compares fingerprints of normalized rows; identical logical events with different wording may look distinct.
- AI backends augment text only; they never replace normalized records.
dataset/bench-promptsemit structure for training; they do not guarantee your fine-tuning provider’s latest JSONL schema — verify against current API docs.reportwith explanation enabled can call remote model APIs; prefer--no-explainon high-frequency CI unless you control keys, quotas, and data-retention policy.
- Deeper remark-specific extractors (more structured fields from
Args) - Optional SARIF or LSP-adjacent bridges
- Tighter CI recipes (
explncc checkpresets)
Use the bundled examples/ to emit real .opt.yaml on your machine, then run explncc to connect source patterns to compiler vocabulary. See docs/chapter-10-notes.md for a suggested teaching order, docs/chapter-11-notes.md for alignment / LLM dataset workflows, and docs/chapter-12-notes.md for CI and PR integration.
You can — and you should, once — to see the raw stream. explncc exists so you can filter, count, diff across builds, and export the same information reliably for notes, CI, and (optionally) model-assisted prose.
- Deterministic core first — every command works without network access.
- No invented fields — missing data stays absent;
args_rawpreserves the source. - AI as augmentation — rule text is always available; HTTP backends only enrich.
- Ollama (local): set
OLLAMA_HOST,OLLAMA_MODEL(defaultqwen2.5-coder:7b-instruct). - OpenAI: set
OPENAI_API_KEY; optionalOPENAI_MODEL(defaultgpt-4o-mini).
brew install llvm # macOS
make examples # writes under build/examples/<name>/Details: docs/getting-started.md and docs/examples.md.
make check— ruff, format check, mypy, pytestmake docs-check— required doc files present- Prefer focused changes with tests beside
tests/fixtures/*.opt.yaml
make install-dev
make check
make demo # needs `make examples` first
make chapter11-demo PYTHON="$(pwd)/.venv/bin/python3" # alignment + bench-prompts sample
make chapter12-demo PYTHON="$(pwd)/.venv/bin/python3" # CI-style github report (fixture)make check
python -m explncc alignment tests/fixtures/simd_vectorized.opt.yaml --json
python -m explncc dataset tests/fixtures/simd_vectorized.opt.yaml -o /tmp/t.jsonl --focus all --format openai-messages --template minimal
python -m explncc bench-prompts tests/fixtures/simd_vectorized.opt.yaml --focus all --templates minimalpython -m explncc report tests/fixtures/inline_miss_no_definition.opt.yaml --format markdown --no-explain
python -m explncc report tests/fixtures/inline_miss_no_definition.opt.yaml --format github --no-explain | head -n 20
python -m pytest -q tests/test_ci_report.py tests/test_report_cli.pyMIT