Fix master CI: expv zero-input NaN, JET-on-1.12 QA, GPU-in-All#229
Draft
ChrisRackauckas-Claude wants to merge 1 commit into
Draft
Fix master CI: expv zero-input NaN, JET-on-1.12 QA, GPU-in-All#229ChrisRackauckas-Claude wants to merge 1 commit into
ChrisRackauckas-Claude wants to merge 1 commit into
Conversation
Three independent master-CI failures on the grouped-tests workflow:
1. Core (NaN == 0.0 at basictests.jl:307, flaky across OS/version).
The real `expv!(w, t::Real, Ks)` method lacked the `iszero(beta)`
guard that the complex method already has. For a zero input vector
`firststep!` skips initializing the Krylov basis V (it only fills it
when beta != 0), so `lmul!(beta, mul!(w, V, expHe))` computes
`0 * <uninitialized memory>`, which is NaN whenever V holds garbage.
Add the same early-return guard, making expv of a zero vector exactly
zero (matching the complex method). Verified: full Core suite now
passes on Julia 1.10 and 1.12 (was reliably NaN on 1.10).
2. QA (6 JET failures on the Julia "1" = 1.12 channel; lts/1.10 was
green). On 1.12 JET traces into LinearAlgebra/Base internals
(`norm(::Vector)` -> `norm_recursive_check` -> `iterate(::Nothing)`,
and the broadcast `unalias`/`copyto_unaliased!` path over
`Adjoint{T, Union{}}`) and reports artifacts there that this package
does not control. Scope the QA `report_call`s to
`target_modules = (ExponentialUtilities,)` — the standard JET-as-QA
configuration — which keeps full coverage of this package's own code.
That scoping surfaced two genuine `may be undefined` findings, fixed
here so the scoped analysis is clean: `si` in `exponential!` and
`order`/`kest` in `kiops` are now unconditionally initialized before
use. Verified: QA passes 17/17 on Julia 1.10 and 1.12.
3. Core (windows, all versions: "CUDA driver not functional"). On
Windows the Core job runs the run_tests "All" aggregate, which pulled
in the GPU group and `using CUDA` errored on the non-GPU runner. Mark
the GPU group `in_all = false` so it only runs under an explicit
GROUP=GPU on the self-hosted CUDA runner. Verified locally: GROUP=All
now runs only Core/basictests.jl, never GPU/gputests.jl.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes three independent failures on the master grouped-tests CI.
1. Core:
NaN == 0.0atbasictests.jl:307(zero-input expv)The real
expv!(w, t::Real, Ks)method was missing theiszero(beta)guard the complex method already has. For a zero input vectorfirststep!skips initializing the Krylov basisV(it only fillsV[:,1]whenbeta != 0), so the finallmul!(beta, mul!(w, @view(V[:,1:m]), expHe))computes0 * <uninitialized memory>, which isNaNwheneverVholds garbage — explaining why the failure was flaky (heap-dependent: green on some OS/runs,NaNon others). Added the same early-return guard soexpvof a zero vector is exactly zero.Verified locally: full
GROUP=CorePkg.testpasses on Julia 1.10 and 1.12 (it reliably producedNaNon 1.10 before).2. QA: 6 JET failures on the
1(= Julia 1.12) channellts(1.10) was green; only1(1.12) failed. On 1.12 JET traces intoLinearAlgebra/Baseinternals —norm(::Vector)→norm_recursive_check→iterate(::Nothing), and the broadcastunalias/copyto_unaliased!path overAdjoint{T, Union{}}— and reports abstract-interpretation artifacts there that this package does not control. Scoped the QAreport_calls totarget_modules = (ExponentialUtilities,)(the standard JET-as-package-QA configuration), which keeps full coverage of this package's own code.That scoping surfaced two genuine
may be undefinedfindings, which are fixed here so the scoped analysis is clean (not silenced):siinexponential!(exp_baseexp.jl) — conditionally assigned insideif s > 0, used inside a separateif s > 0; now initialized to0unconditionally.order/kestinkiops(kiops.jl) — carried across loop iterations via theorderold/kestold"reuse" flags but only conditionally assigned; now seeded with their first-iteration defaults.Verified locally: QA passes 17/17 on Julia 1.10 and 1.12.
3. Core (windows): "CUDA driver not functional"
On Windows the Core job runs the
run_tests"All" aggregate, which pulled in theGPUgroup, andusing CUDAerrored on the non-GPU runner. Marked theGPUgroupin_all = falseso it only ever runs under an explicitGROUP=GPUon the self-hosted CUDA runner. Verified locally:GROUP=Allnow runs onlyCore/basictests.jl, neverGPU/gputests.jl.Not addressed (reported separately)
Static Arraystolerance failure atbasictests.jl:265(expv(t,A,b) ≈ exp(t*A)*b). On linux Julia 1.13-rc1 the worst relative error is1.25e-15; the macOS-pre failure shows~1e-7. This is a macOS/1.13-rc-specific accuracy difference I could not reproduce or correctly fix on linux, and I will not loosen the tolerance without being able to prove the macOS deviation is benign.Please ignore until reviewed by @ChrisRackauckas.