Fix flaky Enzyme test_forward/test_reverse tolerance (RNG-dependent vs fastpower approximation) by ChrisRackauckas-Claude · Pull Request #58 · SciML/FastPower.jl

ChrisRackauckas-Claude · 2026-06-23T20:38:37Z

Please ignore until reviewed by @ChrisRackauckas.

Problem

tests / Enzyme (julia 1) on main went red on the v1.3.2 run (was green 8 days earlier, with identical test source). The failure:

test_forward: fastpower with return activity Duplicated on (::Float64, Duplicated), (::Float64, Const): Test Failed
  Expression: isapprox(x, y; kwargs...)
   Evaluated: isapprox(0.155, 0.15524589105604497; atol = 0.0001, rtol = 0.001)

Root cause

FastPower's Enzyme @easy_rule returns the exact ^ derivative (y*fastpower(x,y-1), Ω*log(x)). EnzymeTestUtils' test_forward/test_reverse compare that rule against finite differences of the deliberately-approximate fastpower primal. So the measured gap is exactly fastpower's own primal approximation error (~1e-3 relative — the same envelope asserted in test/fast_pow_tests.jl), which sat right on top of the old atol=1e-4, rtol=1e-3.

Whether the lane passed depended on the random perturbation test_forward drew from the global RNG. An analytic sweep over the tangent grid the test samples (tangents in -9:0.01:9, central FD-5, at x=1.0, y=0.5) shows:

config	worst abs gap	worst rel gap	fail @ old tol (1e-4,1e-3)
Tx=Dup, Ty=Const	1.17e-3	2.4%	144080/3241800 (4.44%)
Tx=Dup, Ty=Dup	2.04e-3	—	111058/3243600 (3.42%)
Tx=Const, Ty=Dup	6e-14	—	0

The CI failing value 0.15524589105604497 is reproduced exactly at tangent dx=0.3105 (= the exact-^ derivative 0.5·dx), with the FD-of-fastpower reference at 0.1555 — i.e. it is fastpower's primal error, not a wrong rule.

Fix

Seed the RNG (Random.Xoshiro(0)) so the randomized test is reproducible.
Raise the tolerance to atol=1e-3, rtol=1e-2, consistent with fastpower's documented accuracy. This has zero failures across all ~6.5M tangent draws in the grid, while a genuinely wrong rule would still be off by O(1) relative and is not masked.
Add Random to the test [extras]/[targets] and a Random = "1" [compat] entry.

This is the principled fix, not a blanket tolerance loosening: the rule's true error is zero; the only thing being measured is the approximation built into fastpower itself.

Verification (run locally)

Deps resolved match CI: Enzyme 0.13.164, EnzymeTestUtils 0.2.8, FiniteDifferences 0.12.34.

Reproduced the failure through the real test_forward with rng=Xoshiro(16) (first tangent dx=0.31): old tolerance → 6 pass / 1 fail (matches CI); new tolerance → 7 pass / 0 fail.
Fixed Enzyme group via Pkg.test, julia 1.11: enzyme_forward_tests 52/52, enzyme_reverse_tests 36/36, tests passed.
Fixed Enzyme group via Pkg.test, julia lts (1.10): 52/52, 36/36, tests passed.
Seeded forward test is deterministic: 52/52 across 3 repeats.
Runic: clean (no diff) on both edited files.

Note on the other red lanes in the same run

tests / Core (julia 1) and tests / Core (julia lts) were red in the same run but are not code failures: both ran on self-hosted-4vcpu-8gb (smcsd) runners squatting on the ubuntu-latest label; the "Run tests" step emitted zero log output and never recorded a conclusion (runner OOM/lost-communication while precompiling the Mooncake+Enzyme+ReverseDiff stack in 8 GB). Locally the Core group passes cleanly (fast_log2 1200/1200, fast_pow 5/5, other_ad_engines 4/4, all AD-engine derivative comparisons rel=0.0) on both julia 1.11 and lts. That is a runner-capacity infra issue, out of scope for this PR.

🤖 Generated with Claude Code

…hed tolerance The Enzyme `@easy_rule` returns the *exact* `^` derivative, but EnzymeTestUtils `test_forward`/`test_reverse` compare it against finite differences of the *approximate* `fastpower` primal. Because `fastpower` routes through a Float32 `fastlog2` polynomial, the *slope* of its primal differs from the exact slope by ~1e-2 relative near x=1 (measured: exact d/dx = 0.5 vs FD-of-fastpower = 0.5066, i.e. 1.3e-2 relative), even at points like (1.0, 0.5) where the primal *value* is exact. So the FD reference is off from the exact rule by `fastpower`'s inherent approximation error, not by any rule bug. The old atol=1e-4, rtol=1e-3 sat below that gap, so whether the lane passed depended on the random tangents drawn from the global RNG and it went red intermittently (~4% of draws). Two-part fix: 1. Determinism via StableRNG, not Xoshiro. Seeding the global RNG / `Xoshiro` does not actually pin the test, because those streams can change across Julia versions, so the flake could reappear on a new Julia. `StableRNGs.StableRNG` yields a stream guaranteed identical across Julia versions, passed as the `rng=` keyword that EnzymeTestUtils accepts. 2. Tolerance matched to fastpower's documented accuracy (atol=1e-3, rtol=1e-2), not reverted to the tight 1e-4/1e-3. Empirically the inherent gap is real: with the tight tolerance, 8/10 candidate StableRNG seeds pass the forward grid 52/52 but seeds 123 and 31415 fail, and the all-seeds failure boundary is rtol~2e-3. Reverting to rtol=1e-3 would only "pass" by cherry-picking a lucky seed, which would hide the genuine (benign, expected) primal-approximation error. The chosen rtol=1e-2 sits ~5x above the measured worst-case relative discrepancy yet far below the O(1) relative error a genuinely wrong derivative rule would produce, so real regressions are still caught. Verified deterministic forward 52/52 + reverse 36/36 across 3 repeats on both Julia 1 and lts, and 52/52 for all 10 candidate seeds on lts (so the tolerance is seed-independent, not seed-luck). Swap the test dep Random -> StableRNGs in [extras]/[targets].test/[compat]. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ChrisRackauckas-Claude · 2026-06-24T12:27:22Z

Updated to use StableRNGs.jl instead of Random.Xoshiro (per maintainer directive). Force-pushed 345fcc7 -> 9328417 (only-our-commit replaced; force-with-lease confirmed remote tip was unchanged).

Why StableRNG, not Xoshiro: seeding the global RNG / Xoshiro(0) does not actually pin the test, because those streams can change across Julia versions, so the flake could silently reappear on a new Julia. StableRNGs.StableRNG(seed) yields a stream guaranteed identical across Julia versions; it is passed as the rng= keyword that EnzymeTestUtils.test_forward/test_reverse accept. Seed used: StableRNG(123).

Tolerance decision — kept atol=1e-3, rtol=1e-2, did NOT revert to the tight 1e-4/1e-3. I measured whether a stable seed lets the tight tolerance pass, and it does not robustly:

The exact rule vs FD-of-approximate-primal gap is inherent: at (1.0, 0.5), exact d/dx = 0.5 but central FD of the fastpower primal gives 0.5066 — a 1.3e-2 relative slope error — even though fastpower(1.0,0.5) is value-exact. This is the Float32 fastlog2 polynomial's slope error, not a rule bug.
Forward grid over 10 candidate StableRNG seeds at the tight atol=1e-4, rtol=1e-3: 8/10 pass 52/52, but seeds 123 and 31415 fail. The all-seeds-pass boundary is at rtol ~= 2e-3.
Reverting to rtol=1e-3 would only "pass" by cherry-picking a lucky seed, which hides the genuine (benign, expected) approximation error. The chosen rtol=1e-2 sits ~5x above the measured worst-case relative discrepancy yet far below the O(1) relative error a genuinely wrong rule would show, so real regressions are still caught.

Verification (actual test files, not a harness): deterministic across 3 repeats each on Julia 1 (1.12.5, Enzyme 0.13.164) and lts (1.10.11, Enzyme 0.13.166): FORWARD 52/52, REVERSE 36/36. On lts the chosen tolerance also passes 52/52 for all 10 candidate seeds, confirming it is seed-independent rather than seed-luck.

Ignore until reviewed by @ChrisRackauckas.

🤖 Generated with Claude Code

ChrisRackauckas-Claude force-pushed the fix-enzyme-tolerance-rng-flake branch from 345fcc7 to 9328417 Compare June 24, 2026 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix flaky Enzyme test_forward/test_reverse tolerance (RNG-dependent vs fastpower approximation)#58

Fix flaky Enzyme test_forward/test_reverse tolerance (RNG-dependent vs fastpower approximation)#58
ChrisRackauckas-Claude wants to merge 1 commit into
SciML:mainfrom
ChrisRackauckas-Claude:fix-enzyme-tolerance-rng-flake

ChrisRackauckas-Claude commented Jun 23, 2026

Uh oh!

ChrisRackauckas-Claude commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ChrisRackauckas-Claude commented Jun 23, 2026

Problem

Root cause

Fix

Verification (run locally)

Note on the other red lanes in the same run

Uh oh!

ChrisRackauckas-Claude commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants