You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Flaky test report: committed-code failures on 2026-05-06
Tests that failed against committed code (Timer or Post Merge Action builds against main) in the 24 hours ending 2026-05-06T10:00 UTC. All failures were non-reproducible locally with the original seed, consistent with timing-dependent flakiness.
Pattern: Chronic flake present since early 2024. Significant worsening in 2026: 11 builds in Feb, 19 in Mar, 21 in Apr, 14 in May (first 6 days). The April 2026 CI runner migration to m7a.8xlarge likely amplified this timing-sensitive test.
Pattern: Chronic. Quiet from Oct 2025 to Jan 2026, then resurgence: 8 in Feb, 11 in Mar, 13 in Apr, 3 in May. Consistent with environmental sensitivity.
Seed: 1F25B4F7632D6D6C (failure was suite timeout, not assertion)
Reproduced locally: No
First seen: 2025-07-30
Total builds affected: 27
Pattern: Worsening since Apr 2026 (7 builds) and May 2026 (6 builds in 6 days). The failure mode is suite timeout (>1200000 msec), not an assertion failure. This suggests the test hangs intermittently rather than producing a wrong result.
Observations
None of the 10 tests reproduced locally with the original seed. This is expected for timing-dependent flakes — the seed controls randomization but not thread scheduling, GC pauses, or network timing.
Historical failure data from https://metrics.opensearch.org/_dashboards (index pattern: gradle-check-*). Includes all build types (Timer, Post Merge Action, Pull Request) for flake rate assessment.
Flaky test report: committed-code failures on 2026-05-06
Tests that failed against committed code (Timer or Post Merge Action builds against
main) in the 24 hours ending 2026-05-06T10:00 UTC. All failures were non-reproducible locally with the original seed, consistent with timing-dependent flakiness.Summary Table (sorted by total builds affected)
IndexActionIT.testAutoGenerateIdNoDuplicatesSearchRestCancellationIT.testAutomaticCancellationDuringFetchPhaseRecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTestFullRollingRestartIT.testFullRollingRestart_withNoRecoveryPayloadAndSourceRecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodesRareClusterStateIT.testDisassociateNodesWhileShardInitSmokeTestMultiNodeClientYamlTestSuiteIT(20_terms/numeric profiler)CloneSnapshotIT.testCloneShallowSnapshotIndexCloneSnapshotIT.testCloneAfterRepoShallowSettingDisabledEhcacheDiskCacheManagerTests.testCreateAndCloseCacheConcurrentlyDetailed Findings
1. IndexActionIT.testAutoGenerateIdNoDuplicates (SEGMENT replication)
15B7E75E9FF97558,45E706BEDD6C2A232. SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase
45E706BEDD6C2A233. RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest (SEGMENT)
A8FAB62DF92491094. FullRollingRestartIT.testFullRollingRestart_withNoRecoveryPayloadAndSource (SEGMENT)
45E706BEDD6C2A235. RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadWithReducedAllowedNodes (SEGMENT)
E5DFB55DF375D11E6. RareClusterStateIT.testDisassociateNodesWhileShardInit
15B7E75E9FF97558,BCE18225FF96300E7. SmokeTestMultiNodeClientYamlTestSuiteIT (search.aggregation/20_terms/numeric profiler)
45E706BEDD6C2A238. CloneSnapshotIT.testCloneShallowSnapshotIndex
F850481D7A3BDBD29. CloneSnapshotIT.testCloneAfterRepoShallowSettingDisabled
F850481D7A3BDBD210. EhcacheDiskCacheManagerTests.testCreateAndCloseCacheConcurrently
1F25B4F7632D6D6C(failure was suite timeout, not assertion)Observations
None of the 10 tests reproduced locally with the original seed. This is expected for timing-dependent flakes — the seed controls randomization but not thread scheduling, GC pauses, or network timing.
The April 2026 CI runner migration (m5.8xlarge → m7a.8xlarge) correlates with worsening rates for tests Bump com.diffplug.spotless from 5.6.1 to 6.2.0 #1-6 and Bump guava from 30.1.1-jre to 31.0.1-jre in /distribution/tools/plugin-cli #10. Faster CPUs compress timing windows, making races more likely to manifest.
The highest-impact tests (Bump com.diffplug.spotless from 5.6.1 to 6.2.0 #1-5) are all integration tests involving SEGMENT replication strategy. This parameterized variant appears disproportionately affected, suggesting segment replication introduces additional timing sensitivity.
EhcacheDiskCacheManagerTests (Bump guava from 30.1.1-jre to 31.0.1-jre in /distribution/tools/plugin-cli #10) has a distinct failure mode — suite timeout rather than assertion failure. This likely indicates a deadlock or resource exhaustion rather than a race condition.
Data Source
Historical failure data from
https://metrics.opensearch.org/_dashboards(index pattern:gradle-check-*). Includes all build types (Timer, Post Merge Action, Pull Request) for flake rate assessment.