perf(ci): shard backend tests + fix per-test full-population reruns (44min → 11.5min) by Taleef7 · Pull Request #57 · Taleef7/workwell

Taleef7 · 2026-06-03T16:53:49Z

Problem

CI took ~44 min on every push/PR. The entire time was the backend ./gradlew test step — frontend is ~50s and E2E is manual. Per-class timing showed the cost was concentrated in a few integration tests that re-ran a full-population CQL evaluation (~70s) in @BeforeEach, once per test method.

Changes

1. Fix the per-test waste (the real bug)

EvidenceAccessIntegrationTest: ran a full population 14× (1022s) for tests that only need a case to exist and filter audit by their own upload id. Now one shared run via @BeforeAll + @TestInstance(PER_CLASS) → 71s.
CaseFlowRerunIntegrationTest: ran a full population 5× (422s); each test targets a distinct outcome-type case with non-overlapping mutations → one shared run → 146s.
ScopedRun / CaseUpsert / Major1 intentionally left unchanged — their reruns are the behavior under test (idempotency, scoped-run parity, audit invariants) and need per-test isolation.

2. Shard across parallel runners

Backend job is now an 8-way matrix; build.gradle.kts assigns each test class to a shard by a stable hash (Test.include(Spec)), forks 4-wide within a shard (1.5g heap cap), and only shard 0 writes the Gradle cache.
Added a per-class timing diagnostic step for future balancing.
Local runs (no shard env) are unchanged.

Result

44 min → 11m30s (≈3.8× faster), all 239 tests pass (verified the shard counts sum to 239 — no tests dropped).
Remaining ceiling is ScopedRunIntegrationTest (~635s); a single class runs in one fork, so going under ~10 min would require splitting that class — deferred intentionally.

Note on merging

Merging to main will (a) run the new sharded CI and (b) trigger the standard MIE deploy (now working). The deploy is idempotent, so the redeploy is harmless.

🤖 Generated with Claude Code

The backend `./gradlew test` step is the entire CI bottleneck (~44 min), dominated by CQL-heavy integration tests (cqf-fhir-cr evaluations across the synthetic population plus historical-run seeding), previously run ~2-way parallel on a single runner. Frontend is ~50s and E2E is manual. Split the suite across a 6-way matrix, each runner executing a deterministic hash-based subset of test classes (union of all shards = full suite). Within a shard tests still fork 2-way. Only shard 0 writes the shared Gradle cache to avoid concurrent-write contention. build.gradle.kts gains overridable GRADLE_TEST_FORKS and the TEST_SHARD_TOTAL/TEST_SHARD_INDEX selection; with no shard env (local runs) the full suite runs unchanged. Also adds perf/** to CI push triggers so this branch self-verifies. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@nested

…ble) setCandidateClassFiles is not a valid setter in Gradle 9.4.1. Use the Test task's PatternFilterable include(Spec<FileTreeElement>) predicate — the documented mechanism for filtering candidate test classes — to assign each class to a shard by its '/'-separated relative path hash. Directories pass so the tree is traversed; the classpath is unaffected so @nested discovery and class loading still work. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The 6-shard run cut wall-clock 44->25 min but stayed bottlenecked by one lopsided shard (hash distribution clustered heavy CQL integration classes). Increase to 8 shards and fork 4-wide within each (ubuntu-latest = 4 vCPU) so clustered heavy classes overlap; cap per-fork heap at 1.5g so 4 JVMs + their Postgres containers fit the runner. Add an always-on step that prints per-class suite durations, so if balance is still uneven we can move to time-weighted bin-packing with real data. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@testinstance

…tion tests The slowest classes weren't a sharding problem — they re-ran a full-population CQL evaluation (~70s) in @beforeeach, once per test method: - EvidenceAccessIntegrationTest: 14 tests x full run = 1022s. The evidence access/role tests only need a case to exist and filter audit by their own upload id, so they share one population run via @BeforeAll + @testinstance( PER_CLASS). ~1022s -> ~90s. - CaseFlowRerunIntegrationTest: 5 tests x full run = 422s. Each test targets a distinct outcome-type case with non-overlapping mutations, so one shared run is sufficient. ~422s -> ~140s. ScopedRun/CaseUpsert/Major1 are intentionally left as-is: their reruns are the behavior under test (idempotency, rerun-to-verify, empty-table historical seed) and need per-test isolation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…sharding

vercel · 2026-06-03T16:53:51Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
workwell-measure-studio	Ready	Preview, Comment	Jun 3, 2026 4:54pm

Taleef and others added 5 commits June 3, 2026 19:44

Merge remote-tracking branch 'origin/main' into perf/ci-backend-test-…

c630fb9

…sharding

vercel Bot deployed to Preview June 3, 2026 16:54 View deployment

Taleef7 self-assigned this Jun 3, 2026

Taleef7 merged commit e64d18b into main Jun 3, 2026
30 checks passed

Taleef7 deleted the perf/ci-backend-test-sharding branch June 3, 2026 17:09

Taleef7 mentioned this pull request Jun 3, 2026

docs(journal): log 2026-06-03 deploy fix + CI 3.8x speedup #58

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(ci): shard backend tests + fix per-test full-population reruns (44min → 11.5min)#57

perf(ci): shard backend tests + fix per-test full-population reruns (44min → 11.5min)#57
Taleef7 merged 5 commits into
mainfrom
perf/ci-backend-test-sharding

Taleef7 commented Jun 3, 2026

Uh oh!

vercel Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Taleef7 commented Jun 3, 2026

Problem

Changes

Result

Note on merging

Uh oh!

vercel Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 3, 2026 •

edited

Loading