Skip to content

ci: make SERIAL/FLAKY RPC shards non-required#101

Draft
oxarbitrage wants to merge 3 commits into
mainfrom
ci/serial-flaky-non-required
Draft

ci: make SERIAL/FLAKY RPC shards non-required#101
oxarbitrage wants to merge 3 commits into
mainfrom
ci/serial-flaky-non-required

Conversation

@oxarbitrage

@oxarbitrage oxarbitrage commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Problem

main CI shows the required-checks rollup as red. The RPC-tests job builds its matrix as name × shard and then includes a platform object that carries required: true / required_suffix: " (required)". Because the required flag rides on the platform axis (ubuntu-22.04), it gets stamped onto every shard — including the SERIAL/FLAKY per-test shards that are meant to be non-blocking (subclass.py already says they're split out "to enable not requiring that they pass"). They end up required: true, continue-on-error: false, and their failures gate the rollup.

Fix

Make the required flag a per-shard property (single source of truth) instead of a per-platform one, in the set-rpc-tests step:

  • shard-0…9 (the BASE/NEW/ZMQ tests): required: true.
  • SERIAL/FLAKY per-test shards: required: false, required_suffix: ''.
  • Strip required/required_suffix from the platform object in the local merge so it no longer blanket-stamps every shard.

No test logic changes. The interop-request path is unchanged (SERIAL/FLAKY aren't added there, so all shards stay required).

Status / coordination (updated 2026-06-05)

This PR is the CI-matrix mechanism and is correct as-is, but two things have changed since it was opened — read before merging:

  1. It does not go green on its own yet. The required shards (shard-0…9) currently fail too, because cache rebuild (create_cache.py) crashes on an upstream zallet bug: Failed to synchronize zallet: InvalidData { "Missing Orchard tree state" } (wallet sync task crashes with "Missing Orchard tree state" when a shielded pool is active but its tree is still empty wallet#454, fixed by sync: Treat an absent shielded tree state as an empty note commitment tree wallet#455, not yet merged). Until #455 lands, the rollup stays red regardless of this PR.
  2. Pairs with test framework: bound process-shutdown waits to avoid multi-hour hangs #102. Without test framework: bound process-shutdown waits to avoid multi-hour hangs #102's bounded waits, that upstream crash makes a shard hang for the full 6-hour runner limit instead of failing fast. test framework: bound process-shutdown waits to avoid multi-hour hangs #102 should land first so this PR's failures are legible.

Relationship to #104: the six tests failing today (mergetoaddress_{sapling,ua_nu5,ua_sapling}, wallet_shieldingcoinbase, mempool_nu_activation, mempool_packages) are deterministically broken, not flaky — they pass zcashd-style arg lists that the Z3 stack rejects at setup. The plan is to move those into #104's DISABLED_SCRIPTS (don't run them; document per-test migration need), and keep this PR's per-shard required mechanism as the correct foundation and the home for genuinely intermittent quarantines. So #101 and #104 are complementary, layered — not competing.

Verification

Locally simulated the modified matrix generator (extracted verbatim from this file's heredoc): 10 required=true shards (shard-0 … shard-9) and the SERIAL/FLAKY shards as required=false. Interop path (IS_INTEROP_REQUEST=true) still yields 10 required shards and no SERIAL/FLAKY entries.

The RPC-tests matrix attaches `required: true` / `required_suffix` to every
ubuntu-22.04 combination via the platform `include` object, because the
required flag rides on the platform axis rather than the shard axis. As a
result the SERIAL_SCRIPTS and FLAKY_SCRIPTS per-test shards inherit
`required: true` and `continue-on-error: false`, so their failures gate CI
and turn the required-checks rollup red.

This contradicts the existing intent documented in subclass.py: these tests
are split into their own shards "to enable not requiring that they pass."

Carry the required flag per-shard instead (true for shard-0..9, false for the
SERIAL/FLAKY per-test shards) and strip required/required_suffix from the
platform object in the local merge so there is a single source of truth. The
interop-request path is unchanged (SERIAL/FLAKY are not added there; all
shards stay required).

This does not port or fix the six legacy tests themselves; it only stops them
from blocking required CI. Per-test port/quarantine is tracked separately.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread .github/workflows/ci.yml Fixed
Address zizmor template-injection finding (code scanning) on the RPC matrix
merge: read the step output through an env var ($RPC_TEST_MATRIX) instead of
splicing ${{ ... }} directly into the shell script.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@oxarbitrage oxarbitrage closed this Jun 3, 2026
@oxarbitrage oxarbitrage reopened this Jun 3, 2026
The previous commit added a comment inside the set-rpc-tests run block that
contained a literal ${{ }} sequence. GitHub Actions evaluates ${{ }}
expressions anywhere in the file, including run blocks; an empty expression is
a parse error, so the whole CI workflow failed to compile (startup_failure,
no jobs). Reword the comment to avoid the expression delimiters.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants