Skip to content

ci: build MCU SoC timing IR on Linux + de-flake opensta-to-ir locate test#65

Merged
robtaylor merged 5 commits into
mainfrom
fix/cosim-sdf-via-jtir-artifact
May 19, 2026
Merged

ci: build MCU SoC timing IR on Linux + de-flake opensta-to-ir locate test#65
robtaylor merged 5 commits into
mainfrom
fix/cosim-sdf-via-jtir-artifact

Conversation

@robtaylor
Copy link
Copy Markdown
Contributor

Summary

Unblocks PR #63's CI by fixing two unrelated failures it surfaced:

  1. MCU SoC Metal Simulation — the second cosim invocation was passing --sdf / --sdf-corner, which cosim never accepted. Historically masked because clap rejected the now-renamed --max-cycles first. Cosim's documented contract is "pre-convert with opensta-to-ir and pass --timing-ir <.jtir>"; raw SDF support is the docs/plans/post-phase-0-roadmap.md follow-up.

    New prepare-mcu-soc-jtir Linux job builds OpenSTA, installs the pinned sky130A PDK via volare, runs opensta-to-ir against tests/mcu_soc/data/6_final.sdf, and uploads the resulting 6_final.jtir as an artifact. The Metal job needs: it, downloads the artifact, and passes --timing-ir to the second cosim. Removes the redundant "Strip SDF timing checks" step (was overwriting the tracked stripped SDF with identical content).

  2. opensta-to-ir locate test flakelocate_reports_failing_version_probe occasionally got VersionProbeFailed { kind: ExecutableFileBusy } instead of the expected VersionProbeNonZero. Linux's close(2) does not synchronously decrement the inode writer_count, so an immediate execve of the freshly-written stub races. Added sync_all plus a bounded no-op execve probe loop in the test helper so the stub is provably ready before the test runs.

Also documents the Python tooling convention (uv dev dependencies; volare/ciel for PDK install) in CLAUDE.md.

Test plan

  • prepare-mcu-soc-jtir job completes: OpenSTA + opensta-to-ir build, volare installs sky130A, jtir artifact uploaded
  • MCU SoC Metal Simulation job downloads the jtir and completes "Capture stimulus + timing VCD via cosim" without CLI parse errors
  • opensta-to-ir Tests job: locate_reports_failing_version_probe passes (validating the de-flake)
  • Downstream mcu-soc-cvc / mcu-soc-comparison jobs see expected stimulus.vcd / timing VCD artifacts

Notes for PR #63 rebase

PR #63's rename --max-cycles → --max-clock-edges still applies cleanly: the first cosim (line 626) is untouched here. The second cosim (was line 653) is rewritten to use --timing-ir; PR #63's rename of that line will need to be re-applied to the new --max-cycles 10000 \ line. set -o pipefail additions are not in this PR's scope.

Followups still open (tracked in docs/plans/post-phase-0-roadmap.md):

  • Add --sdf subprocess wrapper to cosim so the Linux jtir prep job can be inlined
  • Migrate volare dep to upstream-renamed ciel package

robtaylor added a commit that referenced this pull request May 18, 2026
)

The cosim/sim CLI was renamed `--max-cycles` → `--max-clock-edges`
on main but the MCU SoC workflow steps still passed the old flag.
PR #63 fixed this in isolation; folding the same change here so #65
can be evaluated against CI on its own.

Three call sites updated:
- First cosim (500K edges, UART boot smoke test)
- Second cosim (10K edges, stimulus/timing VCD capture — same step
  this PR rewires to consume --timing-ir)
- Replay sim (10K edges, non-timed VCD for CVC comparison)

Also adds `set -o pipefail` to the two cosim steps that lacked it,
so future failures inside the `tee` pipeline surface immediately
rather than greening through.

Closes #62. Supersedes #63 — close PR #63 after this merges.

Co-developed-by: Claude Code v2.1.143 (claude-opus-4-7)
robtaylor added a commit that referenced this pull request May 18, 2026
mcu-soc-cvc previously ran `make -f makefile.cvc64` which upstream
removed in commit 1c5e043e (2026-03-13). The cvc-reference job was
fixed at the same time; mcu-soc-cvc kept the stale invocation
because the job had been transitively skipped (mcu-soc-metal
failed first, so this job never ran). Now that mcu-soc-metal is
green it ran for the first time and hit the deleted makefile.

Mirror the cvc-reference job's working build: top-level src/Makefile
with the same CFLAGS workaround, output at ../build64/cvc64.

Surfaced by PR #65 CI run 26066588654.

Co-developed-by: Claude Code v2.1.143 (claude-opus-4-7)
robtaylor added 5 commits May 19, 2026 08:47
The locate_and_check tests' write_stub_script helper occasionally hit
ETXTBSY ("Text file busy") on Linux when invoking the freshly-written
stub binary — close(2) does not synchronously decrement the inode's
writer_count, so an immediate execve can race. The test then sees
LocateError::VersionProbeFailed { kind: ExecutableFileBusy } instead
of the expected exit-code-driven variant.

Add sync_all + a brief no-op execve probe loop after drop so the
helper only returns once the file is safely executable. Bounded at
2s with a panic-on-timeout to keep CI runtime predictable.

Surfaced via PR #63 CI run 25938446354.

Co-developed-by: Claude Code v2.1.143 (claude-opus-4-7)
OpenSTA's read_verilog is structural-only: RTL operators (~, &, |),
bit-selects in assigns, and ranged concatenations cause a syntax
error. The LibreLane + wafer.space integration flow wraps the
post-P&R structural body in a thin RTL module (e.g.
openframe_project_wrapper) that patches the chip-frame pad ring
using exactly those operators. Every user pushing a tapeout through
this pipeline — chipflow's mcu_soc, hazard3, future wafer.space
designs — hits the same wrapper-parse failure.

Add a Verilog input filter that runs before the OpenSTA subprocess
invocation: for each --verilog file, extract `module <--top> …
endmodule` if present; pass the file through unchanged otherwise.
Files without a matching module declaration (sub-module-only files
in hierarchical designs) still link cleanly.

Implementation in src/verilog_filter.rs with 7 unit tests covering
the multi-module, single-module, absent-module, prefix-collision,
indented-keyword, no-port-list, and comment-with-endmodule cases.
Integration in opensta::run via a per-file pre-pass that writes
filtered outputs into the existing OpenSTA tempdir.

A second class of pitfall — substituting a pre-P&R synthesis netlist
to dodge the wrapper-parse error — is documented in ADR 0009 and
retracted from the prior ws3-cosim-sdf-followup recipe. That path
silently drops SDF entries for cells P&R inserts (CTS buffers,
antenna diodes, fillers) and produces materially incomplete IR.

No CLI changes — the behaviour is transparent.

Co-developed-by: Claude Code v2.1.143 (claude-opus-4-7)
The cosim/sim CLI was renamed --max-cycles → --max-clock-edges on
main but the MCU SoC workflow steps still passed the old flag.
PR #63 fixed this in isolation; folding the same change here so #65
can be evaluated against CI on its own.

Three call sites updated (first cosim 500K, second cosim 10K,
replay sim 10K). Also adds `set -o pipefail` to the two cosim
steps that lacked it, so future failures inside the `tee` pipeline
surface immediately rather than greening through.

Closes #62. Supersedes #63 — close PR #63 after this merges.

Co-developed-by: Claude Code v2.1.143 (claude-opus-4-7)
The MCU SoC Metal Simulation job's second cosim invocation was
passing flags cosim never accepted (--sdf, --sdf-corner) — historically
masked by clap rejecting `--max-cycles` first. Cosim only accepts
pre-built timing IR (--timing-ir <.jtir>); raw SDF support is the
documented post-phase-0-roadmap follow-up.

Add a new Linux job (prepare-mcu-soc-jtir) that builds OpenSTA,
installs the pinned sky130A PDK via volare, builds opensta-to-ir,
and runs it against tests/mcu_soc/data/6_final.sdf to produce a
6_final.jtir artifact. The Metal job `needs:` this job, downloads
the artifact, and feeds it to cosim via --timing-ir.

The 6_final.v file has a chipflow integration wrapper module
(openframe_project_wrapper) that uses RTL operators OpenSTA's
parser rejects. opensta-to-ir's Verilog input filter handles this
transparently — see the feat(opensta-to-ir) commit earlier in this
PR and ADR 0009 for details.

Also drop the redundant "Strip SDF timing checks" step (it
overwrote a tracked file with identical content; raw SDF is no
longer fed to cosim) and document the Python tooling convention in
CLAUDE.md (uv dev dependencies for Python tools; volare for PDK
install with a forward note on the upstream ciel rename).

Co-developed-by: Claude Code v2.1.143 (claude-opus-4-7)
mcu-soc-cvc previously ran `make -f makefile.cvc64` which upstream
removed in commit 1c5e043e (2026-03-13). The cvc-reference job was
fixed at the same time; mcu-soc-cvc kept the stale invocation
because the job had been transitively skipped (mcu-soc-metal
failed first, so this job never ran). Now that mcu-soc-metal is
green it ran for the first time and hit the deleted makefile.

Mirror the cvc-reference job's working build: top-level src/Makefile
with the same CFLAGS workaround, output at ../build64/cvc64.

Co-developed-by: Claude Code v2.1.143 (claude-opus-4-7)
@robtaylor robtaylor force-pushed the fix/cosim-sdf-via-jtir-artifact branch from 147613b to 9121afe Compare May 19, 2026 07:51
@robtaylor robtaylor merged commit b510ddc into main May 19, 2026
14 checks passed
robtaylor added a commit that referenced this pull request May 19, 2026
The cosim/sim CLI was renamed --max-cycles → --max-clock-edges on
main but the MCU SoC workflow steps still passed the old flag.
PR #63 fixed this in isolation; folding the same change here so #65
can be evaluated against CI on its own.

Three call sites updated (first cosim 500K, second cosim 10K,
replay sim 10K). Also adds `set -o pipefail` to the two cosim
steps that lacked it, so future failures inside the `tee` pipeline
surface immediately rather than greening through.

Closes #62. Supersedes #63 — close PR #63 after this merges.

Co-developed-by: Claude Code v2.1.143 (claude-opus-4-7)
@robtaylor robtaylor deleted the fix/cosim-sdf-via-jtir-artifact branch May 19, 2026 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant