Problem
method_of_snapshots_row divides by singular values without guarding against zeros, injecting NaN into the POD basis when FixedSVDRank(r) requests more modes than the true rank of the snapshot matrix.
Broken code:
function method_of_snapshots_row(red_style, A)
AA = A'*A
_,Sr,Vr = truncated_svd(red_style, AA; issquare=true)
Ur = (A*Vr)/Diagonal(Sr) # ❌ Sr can contain zeros
return Ur,Sr,Vr
end
When FixedSVDRank(r) is used with r > effective_rank(A), modes beyond the true rank have Sr[j] = 0.0. For these null-space modes, A*Vr[:,j] = 0 (zero vector), so Ur[:,j] = 0.0/0.0 = NaN.
The sibling function already has the fix:
function method_of_snapshots_col(red_style, A)
AA = A*A'
Ur,Sr,_ = truncated_svd(red_style, AA; issquare=true)
Vr = Diagonal(Sr.+eps())\(Ur'A) # ✅ guarded with +eps()
return Ur,Sr,Vr'
end
The asymmetry proves this is an oversight - method_of_snapshots_col is correct, method_of_snapshots_row is broken.
Impact
Who is affected: Anyone using FixedSVDRank(r) where r exceeds the effective rank of the snapshot manifold. Common when pre-specifying basis size without knowing true rank.
What breaks: Silent NaN propagation through entire ROM pipeline:
- POD basis contains NaN columns → no error raised
- Galerkin projection:
Φ'AΦ = NaN
- Reduced residual = NaN, reduced Jacobian = NaN
- Online ROM solutions = NaN
- Error metrics = NaN
- No indication of root cause
Why this is critical:
- Offline ROM construction completes without error
reduced_operator returns an RBOperator with silent NaN matrices
- All online solves produce NaN solutions
- Users blame ROM methodology instead of recognizing the bug
Production impact: method_of_snapshots_row is used when DOFs > snapshots (the dominant production path). The less-common method_of_snapshots_col path is correctly guarded.
Root Cause
When FixedSVDRank(r) is used, truncated_svd computes S = sqrt.(svd(A'*A).S) and truncates to first r values without energy-based filtering. For null-space modes, Sr[j] = sqrt(0.0) = 0.0 exactly. Since A*Vr[:,j] = 0 for these modes, division produces 0.0/0.0 = NaN.
method_of_snapshots_col was correctly guarded with +eps(), but the far more commonly used method_of_snapshots_row was not.
Fix
Lines 124, 138: add +eps() guard to match method_of_snapshots_col
# Line 124
Ur = (A*Vr)/Diagonal(Sr.+eps())
# Line 138 (norm-aware overload)
Ũr = (XA*Vr)/Diagonal(Sr.+eps())
For non-degenerate modes where Sr[j] ≈ 1.0, eps() ≈ 2.2e-16 has no numerical effect. For degenerate modes where Sr[j] = 0.0, it prevents division by zero and produces near-zero vectors instead of NaN.
Problem
method_of_snapshots_rowdivides by singular values without guarding against zeros, injecting NaN into the POD basis whenFixedSVDRank(r)requests more modes than the true rank of the snapshot matrix.Broken code:
When
FixedSVDRank(r)is used withr > effective_rank(A), modes beyond the true rank haveSr[j] = 0.0. For these null-space modes,A*Vr[:,j] = 0(zero vector), soUr[:,j] = 0.0/0.0 = NaN.The sibling function already has the fix:
The asymmetry proves this is an oversight -
method_of_snapshots_colis correct,method_of_snapshots_rowis broken.Impact
Who is affected: Anyone using
FixedSVDRank(r)whererexceeds the effective rank of the snapshot manifold. Common when pre-specifying basis size without knowing true rank.What breaks: Silent NaN propagation through entire ROM pipeline:
Φ'AΦ = NaNWhy this is critical:
reduced_operatorreturns anRBOperatorwith silent NaN matricesProduction impact:
method_of_snapshots_rowis used whenDOFs > snapshots(the dominant production path). The less-commonmethod_of_snapshots_colpath is correctly guarded.Root Cause
When
FixedSVDRank(r)is used,truncated_svdcomputesS = sqrt.(svd(A'*A).S)and truncates to firstrvalues without energy-based filtering. For null-space modes,Sr[j] = sqrt(0.0) = 0.0exactly. SinceA*Vr[:,j] = 0for these modes, division produces0.0/0.0 = NaN.method_of_snapshots_colwas correctly guarded with+eps(), but the far more commonly usedmethod_of_snapshots_rowwas not.Fix
Lines 124, 138: add
+eps()guard to matchmethod_of_snapshots_colFor non-degenerate modes where
Sr[j] ≈ 1.0,eps() ≈ 2.2e-16has no numerical effect. For degenerate modes whereSr[j] = 0.0, it prevents division by zero and produces near-zero vectors instead of NaN.