Skip to content

[BUG]: Silent NaN injection into POD basis via division by zero in method_of_snapshots_row #52

@Ady0333

Description

@Ady0333

Problem

method_of_snapshots_row divides by singular values without guarding against zeros, injecting NaN into the POD basis when FixedSVDRank(r) requests more modes than the true rank of the snapshot matrix.

Broken code:

function method_of_snapshots_row(red_style, A)
  AA = A'*A
  _,Sr,Vr = truncated_svd(red_style, AA; issquare=true)
  Ur = (A*Vr)/Diagonal(Sr)  # ❌ Sr can contain zeros
  return Ur,Sr,Vr
end

When FixedSVDRank(r) is used with r > effective_rank(A), modes beyond the true rank have Sr[j] = 0.0. For these null-space modes, A*Vr[:,j] = 0 (zero vector), so Ur[:,j] = 0.0/0.0 = NaN.

The sibling function already has the fix:

function method_of_snapshots_col(red_style, A)
  AA = A*A'
  Ur,Sr,_ = truncated_svd(red_style, AA; issquare=true)
  Vr = Diagonal(Sr.+eps())\(Ur'A)  # ✅ guarded with +eps()
  return Ur,Sr,Vr'
end

The asymmetry proves this is an oversight - method_of_snapshots_col is correct, method_of_snapshots_row is broken.


Impact

Who is affected: Anyone using FixedSVDRank(r) where r exceeds the effective rank of the snapshot manifold. Common when pre-specifying basis size without knowing true rank.

What breaks: Silent NaN propagation through entire ROM pipeline:

  • POD basis contains NaN columns → no error raised
  • Galerkin projection: Φ'AΦ = NaN
  • Reduced residual = NaN, reduced Jacobian = NaN
  • Online ROM solutions = NaN
  • Error metrics = NaN
  • No indication of root cause

Why this is critical:

  • Offline ROM construction completes without error
  • reduced_operator returns an RBOperator with silent NaN matrices
  • All online solves produce NaN solutions
  • Users blame ROM methodology instead of recognizing the bug

Production impact: method_of_snapshots_row is used when DOFs > snapshots (the dominant production path). The less-common method_of_snapshots_col path is correctly guarded.


Root Cause

When FixedSVDRank(r) is used, truncated_svd computes S = sqrt.(svd(A'*A).S) and truncates to first r values without energy-based filtering. For null-space modes, Sr[j] = sqrt(0.0) = 0.0 exactly. Since A*Vr[:,j] = 0 for these modes, division produces 0.0/0.0 = NaN.

method_of_snapshots_col was correctly guarded with +eps(), but the far more commonly used method_of_snapshots_row was not.


Fix

Lines 124, 138: add +eps() guard to match method_of_snapshots_col

# Line 124
Ur = (A*Vr)/Diagonal(Sr.+eps())

# Line 138 (norm-aware overload)
Ũr = (XA*Vr)/Diagonal(Sr.+eps())

For non-degenerate modes where Sr[j] ≈ 1.0, eps() ≈ 2.2e-16 has no numerical effect. For degenerate modes where Sr[j] = 0.0, it prevents division by zero and produces near-zero vectors instead of NaN.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions