Skip to content

Patch full seq parse and write#1

Open
pcruzparri wants to merge 12 commits into
masterfrom
PatchFullSeqParseAndWrite
Open

Patch full seq parse and write#1
pcruzparri wants to merge 12 commits into
masterfrom
PatchFullSeqParseAndWrite

Conversation

@pcruzparri
Copy link
Copy Markdown
Owner

pr to track/copy changes

pcruzparri and others added 12 commits March 10, 2025 15:08
…. Changed the BioPolymerWithSetModsExtensions class to write full sequences separating the C-terminus with a dash. Updated some of the tests that failed because of the new notation of C-terminus mods. Some tests are still failing, and will be updated once happy with this general change.
…t handle ambiguity(or multiple mods at the same position). Modified the corresponding tests or commented them out in case we want to revert.
…s sequences, since it covers most but not many interesting cases. Best to remove it to maintain code coverage. I will add some notes on the issue on the PR for future reference.
pcruzparri pushed a commit that referenced this pull request May 21, 2026
* Move ISingleChargeMs1Feature to MassSpectrometry

The interface describes an MS1 deconvolution result and is the input
contract for the upcoming Deconvoluter.PairPrecursorsToMs2 join. That
method lives in MassSpectrometry.Deconvolution; Readers references
MassSpectrometry rather than the reverse, so the interface had to live
in MassSpectrometry for the dependency direction to work.

Relocate the file via `git mv` (history preserved), re-namespace to
MassSpectrometry, and add `using MassSpectrometry;` to the five Readers
files that implement or reference it. No behavior change. No external
consumer references the interface today (grep-checked on MetaMorpheus,
ProteaseGuru, ProteoformExplorer), so the move is invisible outside
this PR.

Doc tightened to reflect that pairing requires both an RT-window check
and an isolation-window check, not just an RT-apex check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add Deconvoluter.PairPrecursorsToMs2 with FlashDeconv / TopFD coverage

A new join between MS1 deconvolution features (any
ISingleChargeMs1Feature source — Ms1FeatureFile from TopFD/FlashDeconv,
DinosaurTsvFile, or future in-house whole-file deconvolution) and the
MS2 scans that selected them for fragmentation. One method on the
existing Deconvoluter:

    public static IEnumerable<(MsDataScan Ms2Scan, IsotopicEnvelope PrecursorEnvelope)>
        PairPrecursorsToMs2(IEnumerable<ISingleChargeMs1Feature> ms1Features,
                            MsDataFile msDataFile);

Match rule: MS2.RetentionTime ∈ [feat.RetentionTimeStart,
feat.RetentionTimeEnd] AND feat.Mz ∈ MS2.IsolationRange. Pure join —
no cross-charge consensus, no off-by-one correction, no harmonic
filtering, no isobaric-tag reporter extraction. Those concerns belong
to the deconvolution producer upstream or to the search engine
downstream. Pairing is restricted to MsnOrder == 2 (MS3 isolation
targets an MS2 fragment, not an MS1 precursor); MS2 scans with a null
IsolationRange are skipped. Chimeras emit one pair per matching
feature; a single feature spanning multiple MS2 scans likewise emits
one pair per scan.

The returned IsotopicEnvelope is the type MetaMorpheus already speaks
via EngineLayer.Util.Precursor's Precursor(IsotopicEnvelope, ...)
constructor, so downstream wrapping into Ms2ScanWithSpecificMass is a
direct two-line adapter at the call site (no mzLib-side wrapper class
needed):

    var pairs = Deconvoluter.PairPrecursorsToMs2(features, dataFile);
    foreach (var (ms2, envelope) in pairs)
    {
        var precursor = new Precursor(envelope, ...);
        var scan = new Ms2ScanWithSpecificMass(ms2, precursor.MonoisotopicPeakMz,
            precursor.Charge, fullFilePath, commonParameters,
            neutralExperimentalFragments, precursor.Intensity,
            precursor.EnvelopePeakCount, precursor.FractionalIntensity);
        ...
    }

For input from external readers (.ms1.feature etc.), the envelope is
built via the existing 3-arg IsotopicEnvelope(mass, intensity, charge)
constructor — Peaks is a single synthetic entry; NumberOfIsotopes from
the input is not surfaced because per-peak m/z and intensity are
unknown. When the producer is mzLib's own in-house deconvolution it can
hand PairPrecursorsToMs2 real envelopes; the contract works lossless in
that case.

Tests in TestPrecursorPairing.cs cover:
- 14 synthetic-input unit tests: happy path, RT-out-of-window (both
  sides), m/z-out-of-isolation (both sides), chimera, one-feature-
  spanning-many-MS2s, MS2 with null isolation, MS1 ignored, MS3 ignored,
  RT/isolation boundary inclusivity, empty inputs, MS2-free file.
- 3 end-to-end fixture tests: load Ms1Feature_FlashDeconvOpenMs3.0.0
  and Ms1Feature_TopFDv1.6.2 fixtures through the production
  Ms1FeatureFile reader, pair against a synthetic MS2 scan tuned to
  catch only one charge state of feature row #1 (≈10835.9 Da at z=10),
  assert one pair with the expected charge, mass, and format-
  appropriate intensity (FlashDeconv → 0; TopFD → positive).

17/17 pass; 774/774 MassSpectrometryTests + FileReadingTests
regression remains green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Remove unused MassSpectrometry namespace

* Remove unused MassSpectrometry namespace import

* Update Ms1Feature.cs

* Remove unused MassSpectrometry import

Removed unused MassSpectrometry namespace import.

* Remove unused MassSpectrometry import

* Restore using MassSpectrometry; in 5 Readers files (build fix)

These five files reference ISingleChargeMs1Feature in method signatures
or interface implementations, and the interface now lives in the
MassSpectrometry namespace (per the move two commits earlier). Readers
has ImplicitUsings=enable but the auto-generated global usings only
cover stdlib namespaces (System, System.Linq, etc.), so MassSpectrometry
isn't imported implicitly.

The previous round of cleanup commits removed these using directives as
"unused imports" — but each is required for the file to compile.
Restoring them undoes those five build-breaking deletions.

Verified: full Test.csproj suite — 3413/3413 pass (5 benchmarks skipped
as designed) on .NET 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Replace PairPrecursorsToMs2 with FromFile deconvolution algorithm

Adopts the pattern from Nic's PR smith-chem-wisc#1065 draft: instead of a bespoke
join method on Deconvoluter, "deconvolution loaded from disk" becomes
another DeconvolutionAlgorithm that plugs into the existing
Deconvoluter factory. MetaMorpheus's per-MS2 precursor-deconvolution
loop already calls MsDataScan.GetIsolatedMassesAndCharges(precursorSpectrum,
deconParams); swapping ClassicDeconvolutionParameters for
FromFileDeconvolutionParameters transparently switches the entire
pipeline to consume pre-deconvoluted features.

New types:
- DeconvolutionType.FromFile (enum entry)
- FromFileDeconvolutionParameters (DeconvolutionParameters subclass —
  wraps a pre-loaded IEnumerable<ISingleChargeMs1Feature>; the caller
  chooses which Readers parser to use, so the Readers → MassSpectrometry
  dependency arrow is preserved)
- FromFileDeconvolutionAlgorithm (DeconvolutionAlgorithm subclass —
  filters the stored features by m/z range, RT-window overlap, and
  [MinAssumedChargeState, MaxAssumedChargeState], yields synthetic
  IsotopicEnvelopes via the existing 3-arg constructor)
- MzLibUtil.MzRtRange (MzRange subclass adding RT bounds — reusable
  primitive for any future algorithm needing joint m/z + RT filtering;
  adopted verbatim from smith-chem-wisc#1065)

Modified:
- Deconvoluter.Deconvolute(scan, params, range): auto-upgrades a plain
  MzRange to MzRtRange when params are FromFile, using scan.RetentionTime
  as the anchor. The MzSpectrum overload validates that an MzRtRange
  was supplied (no anchor available there) and raises ArgumentException
  otherwise. The branch only fires for FromFile — existing in-memory
  algorithm callers are entirely unaffected.
- Deconvoluter.CreateAlgorithm: factory case for FromFile.
- MsDataScan.GetIsolatedMassesAndCharges (both overloads): same
  auto-upgrade with rtTolerance=0.1 (matches smith-chem-wisc#1065's value). Also
  extracts the 8.5 magic number into a const, mirroring smith-chem-wisc#1065.

Deleted:
- The previous PairPrecursorsToMs2 method on Deconvoluter is gone.
  Callers wanting the bulk-pairing shape can write the 4-line loop
  themselves; the algorithm pattern is the canonical entry point.

Tests (renamed Test/MassSpectrometryTests/TestPrecursorPairing.cs →
TestFromFileDeconvolution.cs via git mv):
- 8 direct-algorithm tests via Deconvoluter.Deconvolute(spectrum,
  params, MzRtRange).
- 6 integration tests via MsDataScan.GetIsolatedMassesAndCharges
  (the production MM entry point).
- 3 end-to-end tests load real Ms1Feature_FlashDeconvOpenMs3.0.0 and
  Ms1Feature_TopFDv1.6.2 fixtures through Ms1FeatureFile, pair against
  a synthetic MS2 tuned to catch one charge state, assert the
  envelope's mass / charge / format-appropriate intensity.

17/17 pass. Full Test.csproj suite: 3413/3413 pass (5 benchmarks
skipped as designed), 2 min 11 s on .NET 8.

Adjustments vs Nic's smith-chem-wisc#1065 sketch:
- Params class takes IEnumerable<ISingleChargeMs1Feature> instead of a
  resultPath string. Keeps file-format coupling out of MassSpectrometry.
- Algorithm body is implemented (sketch was NotImplementedException).
- MsDataScan.GetIsolatedMassesAndCharges(MsDataScan precursorScan, ...)
  uses this.RetentionTime (the MS2's RT) rather than
  precursorScan.RetentionTime. The canonical question is "what features
  were eluting when this MS2 was fired?"; the difference in practice
  is sub-second but the MS2-anchored answer is semantically right.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Move FromFile to Readers; params takes file path (PR smith-chem-wisc#1064 review)

Addresses Nic's review comments on smith-chem-wisc#1064:
- "FromFileDecon parameters and algorithm will need to be moved to the
  readers project. The parameters will then take in the file path and
  load up the objects that are needed."  (file-level comment on params)
- "This one should throw just like inside of Deconvoluter."  (line 192
  of MsDataScan)

Changes:

MassSpectrometry:
- DeconvolutionAlgorithm.Deconvolute promoted from internal abstract to
  protected internal abstract so subclasses in other assemblies (Readers)
  can override. Existing in-project overrides (Classic, IsoDec, Example)
  updated to match.
- DeconvolutionParameters: add virtual CreateAlgorithm() returning null.
  Subclasses living outside MassSpectrometry can override to inject
  their own DeconvolutionAlgorithm without the central enum-switch in
  Deconvoluter needing to know about them.
- Deconvoluter.CreateAlgorithm: try parameters.CreateAlgorithm() first,
  fall back to the enum-based switch for in-project algorithms. The
  FromFile case in the switch now throws an explanatory MzLibException
  rather than instantiating an in-project type — instantiation is the
  params-side's job.
- MsDataScan.GetIsolatedMassesAndCharges (both overloads): remove the
  FromFile auto-upgrade branch. The MzSpectrum overload of
  Deconvoluter.Deconvolute already throws when given FromFile params
  with a plain MzRange — let that error surface instead of silently
  upgrading. Callers wanting from-file decon use the MsDataScan
  overload of GetIsolatedMassesAndCharges, which routes through
  Deconvoluter's MsDataScan overload (still auto-upgrades using the
  precursor scan's RetentionTime).

Old files deleted:
- mzLib/MassSpectrometry/Deconvolution/Algorithms/FromFileDeconvolutionAlgorithm.cs
- mzLib/MassSpectrometry/Deconvolution/Parameters/FromFileDeconvolutionParameters.cs

New files in Readers:
- mzLib/Readers/Deconvolution/FromFileDeconvolutionParameters.cs
  - Public ctor takes a string filePath; loads features via
    FileReader.ReadResultFile (auto-detects FlashDeconv / TopFD / Dinosaur
    via SupportedFileType), then calls LoadResults and materializes the
    per-charge expansion.
  - Internal ctor takes IEnumerable<ISingleChargeMs1Feature> for unit
    tests (uses existing InternalsVisibleTo("Test") in Readers).
  - CreateAlgorithm() override returns FromFileDeconvolutionAlgorithm.
- mzLib/Readers/Deconvolution/FromFileDeconvolutionAlgorithm.cs
  - Same algorithm body; override modifier is now `protected override`
    to satisfy C#'s cross-assembly `protected internal` rule.

Tests reshaped (TestFromFileDeconvolution.cs):
- Integration tests now use the MsDataScan overload of
  GetIsolatedMassesAndCharges (the MzSpectrum overload throws for
  FromFile after this change; that contract is locked by a new test
  MzSpectrumOverloadOfGimc_FromFileWithoutMzRtRange_Throws).
- End-to-end fixture tests now drive the public file-path ctor —
  exercising FileReader.ReadResultFile + SupportedFileType detection +
  Ms1FeatureFile.LoadResults.

18/18 FromFile tests pass. Full Test.csproj suite: 3414/3414 pass,
5 benchmarks skipped, 2 min 12 s on .NET 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Sort features by m/z and binary-search the lower bound (PR smith-chem-wisc#1064 review)

Addresses Nic's efficiency suggestion at FromFileDeconvolutionAlgorithm.cs
line 47 — "Could possibly improve efficiency by storing the features in
an ordered collection within the FromFileParameters."

FromFileDeconvolutionParameters now sorts the per-charge features by
ascending m/z at construction and stores a parallel double[] of m/z
keys. A new internal helper FindFirstIndexAtOrAbove(double minMz) wraps
Array.BinarySearch with the standard "if no match, return insertion
point" convention.

FromFileDeconvolutionAlgorithm.Deconvolute now binary-searches for the
first feature whose m/z is at or above range.Minimum, iterates forward,
and yield-breaks the moment Mz exceeds range.Maximum — turning the
per-MS2 query from O(N) to O(log N + k) where k is the count of
features whose m/z lands inside the requested window.

For a typical search (10k features × ~5k MS2 scans), this is ~400×
fewer per-feature comparisons than the previous linear scan; for
typical k ≈ 1-3 features per isolation window, the constant-factor
improvement is much larger than that.

Behavior-preserving — all 18 FromFile tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Lift PR-touched files to ≥ 90% line coverage with essential tests

Baseline coverage on PR-touched files identified three classes below 90%:
- MzLibUtil.MzRtRange — 34.5% (new class, mostly untested properties / methods)
- MassSpectrometry.Deconvoluter — 83.7% (uncovered: null-range auto-upgrade
  branch, FromFile-throws-from-enum-switch defensive path, decoy-null throw
  in DeconvoluteWithDecoys)
- Readers.FromFileDeconvolutionParameters — 83.9% (uncovered: null filePath
  throw, non-feature-file throw, ToDecoyParameters returns null)

Tests added (essential paths only, no fluff):

New file Test/TestMzRtRange.cs (15 tests):
- Two constructors with bounds verification.
- Derived properties (Minimum/Maximum/MeanMZ/MeanRt/Width pair) in one test.
- Contains: inside, mz-below, mz-above, rt-below, rt-above, inclusive boundaries.
- CompareTo: inside (0), mz-below (1), mz-above (-1), rt-below (1), rt-above (-1).
- ToString smoke check (both m/z and RT bounds appear in output).

Added to TestFromFileDeconvolution.cs (5 tests):
- Null filePath in ctor → ArgumentNullException.
- Non-feature file (.psmtsv) in ctor → MzLibException with diagnostic message.
- ToDecoyParameters → null (locks the "no decoy support" contract).
- Deconvoluter.Deconvolute(MsDataScan, FromFileParams, null) auto-upgrades using
  the scan's MassSpectrum.Range (covers the null-range branch of the ternary).
- DeconvoluteWithDecoys with FromFile params → InvalidOperationException (covers
  the decoy-null defensive path in DeconvoluteWithDecoys).
- A test-only DeconvolutionParameters stub that claims DeconvolutionType.FromFile
  but doesn't override CreateAlgorithm → Deconvoluter.CreateAlgorithm throws
  the explanatory MzLibException (covers the FromFile case in the enum switch
  defensive path).

Skipped (still 100% but borderline):
- MassSpectrometry.MsDataScan at 92.1% — uncovered lines are pre-existing
  code outside this PR's scope.

Result on PR-touched files:
- MzRtRange: 34.5 → 100% line coverage
- FromFileDeconvolutionParameters: 83.9 → 100%
- Deconvoluter: 83.7 → 100%
- All other PR-touched classes already ≥ 97%
- Zero PR-touched classes below 90%

Total: 40 FromFile+MzRtRange tests pass; full Test.csproj suite 3436/3436
pass (5 benchmarks skipped, 2 min 33 s on .NET 8).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants