Skip to content

amarshat/explncc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

explncc

Explain Compiler — parse Clang/LLVM .opt.yaml optimization remark streams, normalize them into a stable schema, and drive summary, stats, diff, export, check, explain, Chapter 11-style training exports, and Chapter 12-style CI reports (Markdown, JSON, HTML, PR comments, policy gates), plus digest and doctor for cache keys and masked config (Chapter 13 themes).

Companion tooling for Decode the Compiler: AI-Guided Explanations of C/C++ Optimization Logs for Real-World Performance.

Why optimization logs matter

The compiler already decided what to optimize, what to skip, and often why. Those decisions are recorded as YAML streams with tags such as !Missed, !Passed, and !Analysis. Treating that output as data — rather than scrolling thousands of lines by hand — is what makes performance work reproducible and teachable.

Why .opt.yaml

Clang can emit a machine-oriented record of optimization events tied to source locations. explncc:

  • parses YAML document streams (not a single mapping),
  • preserves remark kind from YAML tags,
  • normalizes inconsistent Args into message, cost, threshold, and related fields without inventing data,
  • supports directory inputs (all *.opt.yaml recursively).

Install

python3.12 -m venv .venv
source .venv/bin/activate
make install-dev

Quick start

make examples
python -m explncc summary build/examples/ --limit 20
python -m explncc stats build/examples/vectorize_aliasing_fail/ --json
python -m explncc diff \
  build/examples/inline_too_costly/before/before.opt.yaml \
  build/examples/inline_too_costly/after/after.opt.yaml
python -m explncc explain build/examples/inline_miss_no_definition/main.opt.yaml --backend rule
python -m explncc export build/examples/ --format jsonl -o /tmp/out.jsonl
python -m explncc check build/examples/ --max-missed-inline 200

Chapter 11 (SIMD / alignment + LLM datasets)

These commands are deterministic: they do not train or call a model unless you plug the output into your own tooling.

# Heuristic slice: vectorization-related remarks (pass names, keywords, vector width field)
python -m explncc alignment build/examples/vectorize_success/ --limit 20
python -m explncc alignment build/examples/ --json | head -c 600

# JSONL for fine-tuning / instruction tuning (OpenAI-style chat messages + optional metadata)
python -m explncc dataset build/examples/vectorize_aliasing_fail/ \
  -o /tmp/ch11_train.jsonl \
  --focus alignment \
  --template guided \
  --format explncc-record

# Same remarks × multiple prompt shapes (for benchmark sweeps)
python -m explncc bench-prompts build/examples/vectorize_success/vectorize_success.opt.yaml \
  --focus alignment \
  --templates minimal,guided,rubric \
  -o /tmp/ch11_bench.jsonl

See docs/chapter-11-notes.md for how this maps to the chapter outline and where IR must be joined in separately.

Chapter 12 (CI, job summaries, PR comments, triage)

explncc report turns the same normalized remarks into one artifact for pipelines: no separate “CI edition” of the parser.

# GitHub Actions job summary (also: scripts/ci_github_step_summary.sh)
python -m explncc report build/app.opt.yaml --format markdown --no-explain --title "Build remarks" >> "$GITHUB_STEP_SUMMARY"

# Collapsible Markdown for pull-request bots (`gh pr comment --body-file`, etc.)
python -m explncc report build/app.opt.yaml --format github --no-explain -o pr-comment.md

# Machine-readable bundle for dashboards or custom gates
python -m explncc report build/app.opt.yaml --format json --no-explain -o report.json

# Self-contained HTML (browser / attachment friendly)
python -m explncc report build/app.opt.yaml --format html --no-explain -o report.html

# Same thresholds as `check`: exit 1 when limits are exceeded (after writing `-o`)
python -m explncc report build/app.opt.yaml -o triage.md --fail-on-check --max-missed-inline 80

# Optional model layer (use sparingly in CI: cost, latency, secrets)
python -m explncc report build/app.opt.yaml --format markdown --explain-backend rule

# Stable digests over collected .opt.yaml (CI cache keys) and masked backend env
python -m explncc digest build/
python -m explncc doctor

Copy-ready samples live under examples/ci/. Author notes: docs/chapter-12-notes.md, docs/chapter-13-notes.md.

Chapter 14 (diagrams + merged explanations)

explncc viz emits Mermaid diagrams, HTML with Mermaid.js, or JSON for your own graph UI — all from the same normalized remarks as the rest of the tool (not from LLVM IR bitcode).

python -m explncc viz build/examples/ --style pass-summary --format mermaid --top 12 -o remarks.mmd
python -m explncc viz build/app.opt.yaml --style pass-remark --format json -o viz.json
python -m explncc viz build/app.opt.yaml --style missed-top --format html --explain-backend rule -o viz.html

Author notes: docs/chapter-14-notes.md. Demo: make chapter14-demo.

Example output (summary)

Rich tables list kind, pass, remark, function, location, and a truncated message. Use --json or --jsonl for stable downstream tooling.

Architecture

Module Role
explncc/parser.py YAML stream loader with !Missed / !Passed / !Analysis
explncc/normalizer.py Raw document → OptimizationRecord
explncc/models.py Pydantic schema
explncc/summary.py / stats.py Filtering and aggregates
explncc/diffing.py Build-vs-build missed deltas and counters
explncc/exporters.py json, jsonl, csv
explncc/checks.py CI thresholds
explncc/explain/ Rule text + optional HTTP backends
explncc/alignment.py Heuristic SIMD / alignment-related remark slice
explncc/prompt_templates.py Named Chapter 11 user prompts (minimal, guided, rubric)
explncc/dataset_llm.py JSONL builders for training / bench rows
explncc/ci_report.py Markdown / JSON / HTML / GitHub-flavored CI reports
explncc/digest.py Per-file and aggregate SHA-256 over .opt.yaml inputs
explncc/config.py Backend env + doctor payload
explncc/viz.py Mermaid / HTML / JSON visualization bundles (viz command)
explncc/cli.py Typer commands

Subpackages stay small so a book chapter can point to one file at a time.

Supported inputs

  • Clang/LLVM -fsave-optimization-record / -foptimization-record-file=… output (.opt.yaml)
  • One file or a directory tree; only *.opt.yaml files are read

Limitations

  • Heuristics depend on Clang’s YAML shape; newer LLVM versions may add fields (handled conservatively).
  • alignment slice is keyword/pass-based, not semantic analysis; validate on your corpus before publishing benchmark numbers.
  • Diff compares fingerprints of normalized rows; identical logical events with different wording may look distinct.
  • AI backends augment text only; they never replace normalized records.
  • dataset / bench-prompts emit structure for training; they do not guarantee your fine-tuning provider’s latest JSONL schema — verify against current API docs.
  • report with explanation enabled can call remote model APIs; prefer --no-explain on high-frequency CI unless you control keys, quotas, and data-retention policy.

Roadmap

  • Deeper remark-specific extractors (more structured fields from Args)
  • Optional SARIF or LSP-adjacent bridges
  • Tighter CI recipes (explncc check presets)

For readers of Decode the Compiler

Use the bundled examples/ to emit real .opt.yaml on your machine, then run explncc to connect source patterns to compiler vocabulary. See docs/chapter-10-notes.md for a suggested teaching order, docs/chapter-11-notes.md for alignment / LLM dataset workflows, and docs/chapter-12-notes.md for CI and PR integration.

Why not just read .opt.yaml manually?

You can — and you should, once — to see the raw stream. explncc exists so you can filter, count, diff across builds, and export the same information reliably for notes, CI, and (optionally) model-assisted prose.

Design principles

  1. Deterministic core first — every command works without network access.
  2. No invented fields — missing data stays absent; args_raw preserves the source.
  3. AI as augmentation — rule text is always available; HTTP backends only enrich.

Optional model backends

  • Ollama (local): set OLLAMA_HOST, OLLAMA_MODEL (default qwen2.5-coder:7b-instruct).
  • OpenAI: set OPENAI_API_KEY; optional OPENAI_MODEL (default gpt-4o-mini).

See docs/model-backends.md.

Building the book examples

brew install llvm   # macOS
make examples       # writes under build/examples/<name>/

Details: docs/getting-started.md and docs/examples.md.

Contributing

  • make check — ruff, format check, mypy, pytest
  • make docs-check — required doc files present
  • Prefer focused changes with tests beside tests/fixtures/*.opt.yaml

Development workflow

make install-dev
make check
make demo          # needs `make examples` first
make chapter11-demo PYTHON="$(pwd)/.venv/bin/python3"   # alignment + bench-prompts sample
make chapter12-demo PYTHON="$(pwd)/.venv/bin/python3"     # CI-style github report (fixture)

Testing Chapter 11 features

make check
python -m explncc alignment tests/fixtures/simd_vectorized.opt.yaml --json
python -m explncc dataset tests/fixtures/simd_vectorized.opt.yaml -o /tmp/t.jsonl --focus all --format openai-messages --template minimal
python -m explncc bench-prompts tests/fixtures/simd_vectorized.opt.yaml --focus all --templates minimal

Testing Chapter 12 (report)

python -m explncc report tests/fixtures/inline_miss_no_definition.opt.yaml --format markdown --no-explain
python -m explncc report tests/fixtures/inline_miss_no_definition.opt.yaml --format github --no-explain | head -n 20
python -m pytest -q tests/test_ci_report.py tests/test_report_cli.py

License

MIT

About

Explain Compilers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors