Skip to content

Fix NaN-poisoning issue in segment weights#1045

Merged
etal merged 3 commits intomasterfrom
i436
Apr 6, 2026
Merged

Fix NaN-poisoning issue in segment weights#1045
etal merged 3 commits intomasterfrom
i436

Conversation

@etal
Copy link
Copy Markdown
Owner

@etal etal commented Apr 6, 2026

Fixes #436, #908, #901, #896.

@etal
Copy link
Copy Markdown
Owner Author

etal commented Apr 6, 2026

@claude Please review.

@claude
Copy link
Copy Markdown

claude bot commented Apr 6, 2026

Claude finished @etal's task —— View job


Code Review: Fix NaN-poisoning issue in segment weights

Reviewing PR that fixes issues #436, #908, #901, #896 related to NaN-poisoning in segment weights.

Review Progress:

  • Read project guidelines (CLAUDE.md)
  • Review core segmentation changes (segmentation/init.py)
  • Review segment filters changes (segfilters.py)
  • Review reports module changes (reports.py)
  • Review test coverage additions
  • Check for code quality and potential issues
  • Provide comprehensive feedback

Summary

This PR effectively addresses NaN weight propagation issues throughout the CNVkit segmentation pipeline. The changes are well-implemented and comprehensively tested.

Strengths

Comprehensive NaN handling: The fixes properly handle NaN weights at multiple critical points:

  • segmentation/__init__.py:406-407: Uses np.nansum() for segment weight calculation
  • segfilters.py:113-119: Filters out NaN weights before np.average() calls
  • reports.py:402-408: Filters NaN weights before interval statistics computation

Excellent test coverage: Added 8 comprehensive test cases covering all the major NaN scenarios:

  • test_segment_mean_nan_weights(): Verifies weighted averages with partial/all NaN weights
  • test_transfer_fields_nan_weights(): Ensures segment weights are properly calculated
  • test_squash_region_nan_weights(): Tests segment merging with NaN weights
  • test_bic_nan_weights(): Validates BIC filter robustness
  • test_compute_gene_stats_nan_weights(): Tests gene statistics with NaN weights

Follows project conventions:

  • Adheres to the explicit guidance in CLAUDE.md about NaN weight safety
  • Uses np.nansum() consistently as recommended
  • Filters ~np.isnan(wt) before np.average() calls

Clinical safety: The changes preserve numerical output stability - critical for downstream clinical pipelines

Key Improvements

segmentation/init.py:406-407 - Fixed the core issue:

# Use nansum so NaN weights don't propagate into .cns output
seg_wt = float(np.nansum(wt))
if seg_wt > 0:
    valid = ~np.isnan(wt)
    seg_dp = np.average(bin_depths[bin_idx][valid], weights=wt[valid])

segfilters.py:113-119 - Robust weight filtering:

def _wavg(col: str) -> float:
    if region_weight > 0 and valid_wt.any():
        return float(np.average(cnarr[col].to_numpy()[valid_wt], weights=wt[valid_wt]))
    return float(np.mean(cnarr[col]))

reports.py:402-408 - Prevents NaN poisoning in gene statistics:

weights = bins["weight"].to_numpy() if "weight" in bins else np.ones(len(bins_log2))
valid = ~np.isnan(weights)
if not valid.all():
    bins_log2 = bins_log2[valid]
    weights = weights[valid]

Minor Suggestions

  1. segfilters.py:229-235: The BIC filter fallback logic could benefit from a comment explaining when needs_fallback triggers and why the median fallback is appropriate.

  2. Documentation: Consider adding a brief note in the relevant docstrings about NaN weight handling behavior.

Conclusion

This is a high-quality fix that addresses a real production issue. The implementation is robust, well-tested, and follows project best practices. The changes maintain backward compatibility while fixing the NaN propagation bugs.

Recommendation: Approve


@codecov
Copy link
Copy Markdown

codecov bot commented Apr 6, 2026

Codecov Report

❌ Patch coverage is 87.50000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.41%. Comparing base (3e5fb28) to head (602948a).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
cnvlib/segfilters.py 83.33% 0 Missing and 2 partials ⚠️
cnvlib/reports.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1045      +/-   ##
==========================================
+ Coverage   66.23%   66.41%   +0.18%     
==========================================
  Files          72       72              
  Lines        7396     7409      +13     
  Branches     1301     1303       +2     
==========================================
+ Hits         4899     4921      +22     
+ Misses       2057     2049       -8     
+ Partials      440      439       -1     
Flag Coverage Δ
unittests 66.41% <87.50%> (+0.18%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

etal and others added 3 commits April 6, 2026 14:00
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
#908)

Guard all downstream np.average/np.sum calls against NaN weights from
.cnr files produced by older CNVkit versions or external tools:

- transfer_fields: use np.nansum so NaN weights don't propagate into .cns
- squash_region: NaN-masked np.average and weighted_median calls
- bic: treat NaN weights same as zero for RSS fallback
- compute_gene_stats: filter NaN weights before bootstrap CI/PI

Review fixes: group_by_genes depth fallback, type annotations, test precision

- group_by_genes: use np.nansum for weight aggregation; fall back to
  unweighted mean depth when all weights are NaN
- shift_xx: widen is_xx parameter to accept bool | numpy.bool_ | None
- do_genemetrics: fix is_sample_female annotation (None -> bool | None)
- test_segment_mean_nan_weights: assert exact expected values, not just
  non-NaN

Document NaN weight handling in docstrings and BIC fallback logic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@etal etal merged commit 2f6f3e0 into master Apr 6, 2026
15 checks passed
@etal etal deleted the i436 branch April 6, 2026 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cnvkit batch fail for WES data

1 participant