perf(ci): shard backend tests + fix per-test full-population reruns (44min → 11.5min)#57
Merged
Merged
Conversation
The backend `./gradlew test` step is the entire CI bottleneck (~44 min), dominated by CQL-heavy integration tests (cqf-fhir-cr evaluations across the synthetic population plus historical-run seeding), previously run ~2-way parallel on a single runner. Frontend is ~50s and E2E is manual. Split the suite across a 6-way matrix, each runner executing a deterministic hash-based subset of test classes (union of all shards = full suite). Within a shard tests still fork 2-way. Only shard 0 writes the shared Gradle cache to avoid concurrent-write contention. build.gradle.kts gains overridable GRADLE_TEST_FORKS and the TEST_SHARD_TOTAL/TEST_SHARD_INDEX selection; with no shard env (local runs) the full suite runs unchanged. Also adds perf/** to CI push triggers so this branch self-verifies. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ble) setCandidateClassFiles is not a valid setter in Gradle 9.4.1. Use the Test task's PatternFilterable include(Spec<FileTreeElement>) predicate — the documented mechanism for filtering candidate test classes — to assign each class to a shard by its '/'-separated relative path hash. Directories pass so the tree is traversed; the classpath is unaffected so @nested discovery and class loading still work. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The 6-shard run cut wall-clock 44->25 min but stayed bottlenecked by one lopsided shard (hash distribution clustered heavy CQL integration classes). Increase to 8 shards and fork 4-wide within each (ubuntu-latest = 4 vCPU) so clustered heavy classes overlap; cap per-fork heap at 1.5g so 4 JVMs + their Postgres containers fit the runner. Add an always-on step that prints per-class suite durations, so if balance is still uneven we can move to time-weighted bin-packing with real data. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tion tests The slowest classes weren't a sharding problem — they re-ran a full-population CQL evaluation (~70s) in @beforeeach, once per test method: - EvidenceAccessIntegrationTest: 14 tests x full run = 1022s. The evidence access/role tests only need a case to exist and filter audit by their own upload id, so they share one population run via @BeforeAll + @testinstance( PER_CLASS). ~1022s -> ~90s. - CaseFlowRerunIntegrationTest: 5 tests x full run = 422s. Each test targets a distinct outcome-type case with non-overlapping mutations, so one shared run is sufficient. ~422s -> ~140s. ScopedRun/CaseUpsert/Major1 are intentionally left as-is: their reruns are the behavior under test (idempotency, rerun-to-verify, empty-table historical seed) and need per-test isolation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
CI took ~44 min on every push/PR. The entire time was the backend
./gradlew teststep — frontend is ~50s and E2E is manual. Per-class timing showed the cost was concentrated in a few integration tests that re-ran a full-population CQL evaluation (~70s) in@BeforeEach, once per test method.Changes
1. Fix the per-test waste (the real bug)
EvidenceAccessIntegrationTest: ran a full population 14× (1022s) for tests that only need a case to exist and filter audit by their own upload id. Now one shared run via@BeforeAll+@TestInstance(PER_CLASS)→ 71s.CaseFlowRerunIntegrationTest: ran a full population 5× (422s); each test targets a distinct outcome-type case with non-overlapping mutations → one shared run → 146s.ScopedRun/CaseUpsert/Major1intentionally left unchanged — their reruns are the behavior under test (idempotency, scoped-run parity, audit invariants) and need per-test isolation.2. Shard across parallel runners
build.gradle.ktsassigns each test class to a shard by a stable hash (Test.include(Spec)), forks 4-wide within a shard (1.5g heap cap), and only shard 0 writes the Gradle cache.Result
ScopedRunIntegrationTest(~635s); a single class runs in one fork, so going under ~10 min would require splitting that class — deferred intentionally.Note on merging
Merging to
mainwill (a) run the new sharded CI and (b) trigger the standard MIE deploy (now working). The deploy is idempotent, so the redeploy is harmless.🤖 Generated with Claude Code