ci: optimise runtime#2860
Closed
CommanderStorm wants to merge 18 commits into
Closed
Conversation
- unit-test-svc-pg: 3-entry matrix collapsed into one job iterating over postgis variants in a bash loop. Frees 2 runner slots at t=0 (we sit on the 10-concurrent-job cap). - unit-test-svc-minio: chained behind unit-test-svc-pg to share a slot. - Minor-checks lane (single slot): lint-unit-test-js -> validate-schemas -> test-publish -> check-doc. - lint-rust split: clippy stays as lint-rust (head of the rust chain that gates unit-test-rust and lint-rust-dependencies); cargo-hack --each-feature check moved to a new `check-features` job chained behind it. `just check-doc` extracted into its own (docker-only) job at the tail of the minor-checks lane. t=0 fan-out: 10 jobs, fits exactly under the cap.
Moves the bash loop from unit-test-svc-pg into reusable justfile recipes: - _test-pg-cargo: cargo invocations against the existing DATABASE_URL - test-pg-compile: --no-run compile of the test binaries - test-pg-against image args sslmode: start an ephemeral postgis, init schema, run _test-pg-cargo, tear down - test-pg-matrix: explicit list of test-pg-against calls, one per postgis variant. Depends on test-pg-compile so the binaries are built once. `test-pg` now delegates to start + _test-pg-cargo (no behavioral change). The matrix runner uses _test-pg-cargo directly to avoid the docker-compose `start` dep that previously conflicted with the manual docker run container on port 5432. CI step shrinks to a single `just test-pg-matrix` call.
Performance Comparison
|
justfile's `export DATABASE_URL := 'postgres://...@localhost:' + PGPORT + '/db' + ...` overrides any caller-supplied DATABASE_URL when a new `just` process starts. test-pg-against was exporting the matrix-built DATABASE_URL in bash and then invoking `just _test-pg-cargo`, which re-entered just and reset DATABASE_URL to the justfile default (:5411/db), so cargo test connected to the wrong port and missing database. Inline the three cargo test invocations directly in test-pg-against so the bash-exported DATABASE_URL is what cargo sees. Drop _test-pg-cargo and revert `test-pg` to inline cargo commands (its original shape). Also rename PG* shell locals to *_LOCAL to keep them clearly distinct from the inherited PG* env vars consumed by tests/fixtures/initdb.sh.
…shot Address two critical-section problems in the PR pipeline: - `unit-test-rust` ran `test-packages-ci` then `test-doc` serially in one job (~10 min each = ~20 min). Extract `test-doc-rust` into its own job gated on the same `lint-rust` predecessor. The two now run in parallel, cutting that lane to ~10 min wall clock. - `lint-rust-dependencies` recompiles the workspace on nightly for `cargo-shear` (~30 min). It was the longest single critical-section after `lint-rust`. Gate it to `event_name != 'pull_request'`: still runs on main and release, no longer blocks PR. `package` already doesn't gate on it; `done` keeps it in `needs:` but the merge gate no longer treats `skipped` as fatal so PR runs pass cleanly. - Bless `pg_table_source_test::table_source-4` snapshot: bounds[3] shifted by 1 ULP (84.24401966908702 -> 84.24401966908701), likely a postgis version drift in ST_Extent precision on the matrix's first variant (postgis:11-3.0-alpine).
Skipping lint-rust-dependencies on PR meant cargo-shear failures only showed up on the post-merge main push - a red main goes unnoticed in practice, so the gate is effectively missing. Run it on every PR again, but add Swatinem/rust-cache keyed `shear-nightly` so the nightly workspace compile is amortised across runs. On a warm cache the 30 min should drop to a few minutes; first run after this change populates the cache. Restore done's `skipped` failure clause now that nothing is intentionally skipped on PR.
Last-ULP value differs between runs because of postgres parallel ST_Extent aggregation order. Previous bless made one variant pass and another fail. Use insta's path redaction to mask just bounds[3]; the remaining three bounds, srid, geometry, and properties are still asserted exactly.
zizmor flags every Swatinem/rust-cache invocation as potentially cache- poisonable. GitHub Actions cache is scoped per-ref, so a PR cannot write to main's cache scope, and PR scopes are isolated from each other. The warning is noise for our trust boundary. Annotate all 5 uses (unit-test-rust, test-doc-rust, lint-rust, check-features, lint-rust-dependencies) with the standard ignore.
CommanderStorm
commented
Jun 6, 2026
The bounds[3] snapshot mismatch was non-determinism, not a postgis version drift: same image, same code, same test produced 84.244...8701 on some runs and ...8702 on others. The redaction was a band-aid; main is consistent (always ...8702) because its CI setup happens to land in a single-worker regime. Set max_parallel_workers_per_gather=0 on the docker-run command so ST_Extent aggregation runs single-threaded and the floating-point sum order is deterministic. Snapshot file reverted to match main.
CommanderStorm
commented
Jun 6, 2026
| uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0 | ||
| - name: Run cargo test | ||
| run: just test-packages-ci | ||
| test-doc-rust: |
Member
Author
There was a problem hiding this comment.
split from just check, because largest critical section now
Member
Author
|
Churn is necessary due to the mln introductions impact on compile times |
cargo publish --dry-run was spending ~15 min on the per-crate "extract the tarball and rebuild it from scratch" verification. Workspace-level compilation is already covered by lint-rust and check-features (both compile against default features). The remaining value of test-publish - validating the published Cargo.toml shapes, manifest fields, and tarball file list - is preserved. Also drop the rendering-deps apt-install since --no-verify never links maplibre-native. Trade-off: --no-verify would miss bugs that only manifest when a single crate is built in isolation (e.g. a file omitted from the package include glob). These are rare; cargo's manifest check still catches missing path-only deps, missing version fields, etc.
- package: only needs the 7 build-binary jobs whose artifacts it packages. Tests, lints, and test-publish no longer gate packaging. - done: explicit merge gate listing every check, test, and build whose failure should block merge. unit-test-svc-minio was previously orphaned (nothing depended on it); now gated here. - musl builds: chain behind lint-rust. They're relatively fast and don't need a t=0 slot; freeing those 2 slots leaves headroom for cross-workflow contention without queuing. t=0 fan-out: 10 -> 8. Critical path side effect: package can now start as soon as the slowest build finishes (instead of waiting for unit-test-rust ubuntu), so when the rust lane dominates, package runs in parallel with it.
lint-rust is ~13 min on cold cache; gating musl on it would delay the musl jobs by that much on cache-miss runs. The dependency is only there to free a t=0 slot, not to actually gate semantics. lint-unit-test-js runs in ~32s either way (npm cache stable), giving the same slot-pressure relief with a far smaller worst-case delay.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Our CI is agonislingly slow.
Lets try if this improves CI perf..
This applies a few ideas: