Patch full seq parse and write#1
Open
pcruzparri wants to merge 12 commits into
Open
Conversation
…. Changed the BioPolymerWithSetModsExtensions class to write full sequences separating the C-terminus with a dash. Updated some of the tests that failed because of the new notation of C-terminus mods. Some tests are still failing, and will be updated once happy with this general change.
…t handle ambiguity(or multiple mods at the same position). Modified the corresponding tests or commented them out in case we want to revert.
…s sequences, since it covers most but not many interesting cases. Best to remove it to maintain code coverage. I will add some notes on the issue on the PR for future reference.
pcruzparri
pushed a commit
that referenced
this pull request
May 21, 2026
* Move ISingleChargeMs1Feature to MassSpectrometry
The interface describes an MS1 deconvolution result and is the input
contract for the upcoming Deconvoluter.PairPrecursorsToMs2 join. That
method lives in MassSpectrometry.Deconvolution; Readers references
MassSpectrometry rather than the reverse, so the interface had to live
in MassSpectrometry for the dependency direction to work.
Relocate the file via `git mv` (history preserved), re-namespace to
MassSpectrometry, and add `using MassSpectrometry;` to the five Readers
files that implement or reference it. No behavior change. No external
consumer references the interface today (grep-checked on MetaMorpheus,
ProteaseGuru, ProteoformExplorer), so the move is invisible outside
this PR.
Doc tightened to reflect that pairing requires both an RT-window check
and an isolation-window check, not just an RT-apex check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add Deconvoluter.PairPrecursorsToMs2 with FlashDeconv / TopFD coverage
A new join between MS1 deconvolution features (any
ISingleChargeMs1Feature source — Ms1FeatureFile from TopFD/FlashDeconv,
DinosaurTsvFile, or future in-house whole-file deconvolution) and the
MS2 scans that selected them for fragmentation. One method on the
existing Deconvoluter:
public static IEnumerable<(MsDataScan Ms2Scan, IsotopicEnvelope PrecursorEnvelope)>
PairPrecursorsToMs2(IEnumerable<ISingleChargeMs1Feature> ms1Features,
MsDataFile msDataFile);
Match rule: MS2.RetentionTime ∈ [feat.RetentionTimeStart,
feat.RetentionTimeEnd] AND feat.Mz ∈ MS2.IsolationRange. Pure join —
no cross-charge consensus, no off-by-one correction, no harmonic
filtering, no isobaric-tag reporter extraction. Those concerns belong
to the deconvolution producer upstream or to the search engine
downstream. Pairing is restricted to MsnOrder == 2 (MS3 isolation
targets an MS2 fragment, not an MS1 precursor); MS2 scans with a null
IsolationRange are skipped. Chimeras emit one pair per matching
feature; a single feature spanning multiple MS2 scans likewise emits
one pair per scan.
The returned IsotopicEnvelope is the type MetaMorpheus already speaks
via EngineLayer.Util.Precursor's Precursor(IsotopicEnvelope, ...)
constructor, so downstream wrapping into Ms2ScanWithSpecificMass is a
direct two-line adapter at the call site (no mzLib-side wrapper class
needed):
var pairs = Deconvoluter.PairPrecursorsToMs2(features, dataFile);
foreach (var (ms2, envelope) in pairs)
{
var precursor = new Precursor(envelope, ...);
var scan = new Ms2ScanWithSpecificMass(ms2, precursor.MonoisotopicPeakMz,
precursor.Charge, fullFilePath, commonParameters,
neutralExperimentalFragments, precursor.Intensity,
precursor.EnvelopePeakCount, precursor.FractionalIntensity);
...
}
For input from external readers (.ms1.feature etc.), the envelope is
built via the existing 3-arg IsotopicEnvelope(mass, intensity, charge)
constructor — Peaks is a single synthetic entry; NumberOfIsotopes from
the input is not surfaced because per-peak m/z and intensity are
unknown. When the producer is mzLib's own in-house deconvolution it can
hand PairPrecursorsToMs2 real envelopes; the contract works lossless in
that case.
Tests in TestPrecursorPairing.cs cover:
- 14 synthetic-input unit tests: happy path, RT-out-of-window (both
sides), m/z-out-of-isolation (both sides), chimera, one-feature-
spanning-many-MS2s, MS2 with null isolation, MS1 ignored, MS3 ignored,
RT/isolation boundary inclusivity, empty inputs, MS2-free file.
- 3 end-to-end fixture tests: load Ms1Feature_FlashDeconvOpenMs3.0.0
and Ms1Feature_TopFDv1.6.2 fixtures through the production
Ms1FeatureFile reader, pair against a synthetic MS2 scan tuned to
catch only one charge state of feature row #1 (≈10835.9 Da at z=10),
assert one pair with the expected charge, mass, and format-
appropriate intensity (FlashDeconv → 0; TopFD → positive).
17/17 pass; 774/774 MassSpectrometryTests + FileReadingTests
regression remains green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Remove unused MassSpectrometry namespace
* Remove unused MassSpectrometry namespace import
* Update Ms1Feature.cs
* Remove unused MassSpectrometry import
Removed unused MassSpectrometry namespace import.
* Remove unused MassSpectrometry import
* Restore using MassSpectrometry; in 5 Readers files (build fix)
These five files reference ISingleChargeMs1Feature in method signatures
or interface implementations, and the interface now lives in the
MassSpectrometry namespace (per the move two commits earlier). Readers
has ImplicitUsings=enable but the auto-generated global usings only
cover stdlib namespaces (System, System.Linq, etc.), so MassSpectrometry
isn't imported implicitly.
The previous round of cleanup commits removed these using directives as
"unused imports" — but each is required for the file to compile.
Restoring them undoes those five build-breaking deletions.
Verified: full Test.csproj suite — 3413/3413 pass (5 benchmarks skipped
as designed) on .NET 8.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Replace PairPrecursorsToMs2 with FromFile deconvolution algorithm
Adopts the pattern from Nic's PR smith-chem-wisc#1065 draft: instead of a bespoke
join method on Deconvoluter, "deconvolution loaded from disk" becomes
another DeconvolutionAlgorithm that plugs into the existing
Deconvoluter factory. MetaMorpheus's per-MS2 precursor-deconvolution
loop already calls MsDataScan.GetIsolatedMassesAndCharges(precursorSpectrum,
deconParams); swapping ClassicDeconvolutionParameters for
FromFileDeconvolutionParameters transparently switches the entire
pipeline to consume pre-deconvoluted features.
New types:
- DeconvolutionType.FromFile (enum entry)
- FromFileDeconvolutionParameters (DeconvolutionParameters subclass —
wraps a pre-loaded IEnumerable<ISingleChargeMs1Feature>; the caller
chooses which Readers parser to use, so the Readers → MassSpectrometry
dependency arrow is preserved)
- FromFileDeconvolutionAlgorithm (DeconvolutionAlgorithm subclass —
filters the stored features by m/z range, RT-window overlap, and
[MinAssumedChargeState, MaxAssumedChargeState], yields synthetic
IsotopicEnvelopes via the existing 3-arg constructor)
- MzLibUtil.MzRtRange (MzRange subclass adding RT bounds — reusable
primitive for any future algorithm needing joint m/z + RT filtering;
adopted verbatim from smith-chem-wisc#1065)
Modified:
- Deconvoluter.Deconvolute(scan, params, range): auto-upgrades a plain
MzRange to MzRtRange when params are FromFile, using scan.RetentionTime
as the anchor. The MzSpectrum overload validates that an MzRtRange
was supplied (no anchor available there) and raises ArgumentException
otherwise. The branch only fires for FromFile — existing in-memory
algorithm callers are entirely unaffected.
- Deconvoluter.CreateAlgorithm: factory case for FromFile.
- MsDataScan.GetIsolatedMassesAndCharges (both overloads): same
auto-upgrade with rtTolerance=0.1 (matches smith-chem-wisc#1065's value). Also
extracts the 8.5 magic number into a const, mirroring smith-chem-wisc#1065.
Deleted:
- The previous PairPrecursorsToMs2 method on Deconvoluter is gone.
Callers wanting the bulk-pairing shape can write the 4-line loop
themselves; the algorithm pattern is the canonical entry point.
Tests (renamed Test/MassSpectrometryTests/TestPrecursorPairing.cs →
TestFromFileDeconvolution.cs via git mv):
- 8 direct-algorithm tests via Deconvoluter.Deconvolute(spectrum,
params, MzRtRange).
- 6 integration tests via MsDataScan.GetIsolatedMassesAndCharges
(the production MM entry point).
- 3 end-to-end tests load real Ms1Feature_FlashDeconvOpenMs3.0.0 and
Ms1Feature_TopFDv1.6.2 fixtures through Ms1FeatureFile, pair against
a synthetic MS2 tuned to catch one charge state, assert the
envelope's mass / charge / format-appropriate intensity.
17/17 pass. Full Test.csproj suite: 3413/3413 pass (5 benchmarks
skipped as designed), 2 min 11 s on .NET 8.
Adjustments vs Nic's smith-chem-wisc#1065 sketch:
- Params class takes IEnumerable<ISingleChargeMs1Feature> instead of a
resultPath string. Keeps file-format coupling out of MassSpectrometry.
- Algorithm body is implemented (sketch was NotImplementedException).
- MsDataScan.GetIsolatedMassesAndCharges(MsDataScan precursorScan, ...)
uses this.RetentionTime (the MS2's RT) rather than
precursorScan.RetentionTime. The canonical question is "what features
were eluting when this MS2 was fired?"; the difference in practice
is sub-second but the MS2-anchored answer is semantically right.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Move FromFile to Readers; params takes file path (PR smith-chem-wisc#1064 review)
Addresses Nic's review comments on smith-chem-wisc#1064:
- "FromFileDecon parameters and algorithm will need to be moved to the
readers project. The parameters will then take in the file path and
load up the objects that are needed." (file-level comment on params)
- "This one should throw just like inside of Deconvoluter." (line 192
of MsDataScan)
Changes:
MassSpectrometry:
- DeconvolutionAlgorithm.Deconvolute promoted from internal abstract to
protected internal abstract so subclasses in other assemblies (Readers)
can override. Existing in-project overrides (Classic, IsoDec, Example)
updated to match.
- DeconvolutionParameters: add virtual CreateAlgorithm() returning null.
Subclasses living outside MassSpectrometry can override to inject
their own DeconvolutionAlgorithm without the central enum-switch in
Deconvoluter needing to know about them.
- Deconvoluter.CreateAlgorithm: try parameters.CreateAlgorithm() first,
fall back to the enum-based switch for in-project algorithms. The
FromFile case in the switch now throws an explanatory MzLibException
rather than instantiating an in-project type — instantiation is the
params-side's job.
- MsDataScan.GetIsolatedMassesAndCharges (both overloads): remove the
FromFile auto-upgrade branch. The MzSpectrum overload of
Deconvoluter.Deconvolute already throws when given FromFile params
with a plain MzRange — let that error surface instead of silently
upgrading. Callers wanting from-file decon use the MsDataScan
overload of GetIsolatedMassesAndCharges, which routes through
Deconvoluter's MsDataScan overload (still auto-upgrades using the
precursor scan's RetentionTime).
Old files deleted:
- mzLib/MassSpectrometry/Deconvolution/Algorithms/FromFileDeconvolutionAlgorithm.cs
- mzLib/MassSpectrometry/Deconvolution/Parameters/FromFileDeconvolutionParameters.cs
New files in Readers:
- mzLib/Readers/Deconvolution/FromFileDeconvolutionParameters.cs
- Public ctor takes a string filePath; loads features via
FileReader.ReadResultFile (auto-detects FlashDeconv / TopFD / Dinosaur
via SupportedFileType), then calls LoadResults and materializes the
per-charge expansion.
- Internal ctor takes IEnumerable<ISingleChargeMs1Feature> for unit
tests (uses existing InternalsVisibleTo("Test") in Readers).
- CreateAlgorithm() override returns FromFileDeconvolutionAlgorithm.
- mzLib/Readers/Deconvolution/FromFileDeconvolutionAlgorithm.cs
- Same algorithm body; override modifier is now `protected override`
to satisfy C#'s cross-assembly `protected internal` rule.
Tests reshaped (TestFromFileDeconvolution.cs):
- Integration tests now use the MsDataScan overload of
GetIsolatedMassesAndCharges (the MzSpectrum overload throws for
FromFile after this change; that contract is locked by a new test
MzSpectrumOverloadOfGimc_FromFileWithoutMzRtRange_Throws).
- End-to-end fixture tests now drive the public file-path ctor —
exercising FileReader.ReadResultFile + SupportedFileType detection +
Ms1FeatureFile.LoadResults.
18/18 FromFile tests pass. Full Test.csproj suite: 3414/3414 pass,
5 benchmarks skipped, 2 min 12 s on .NET 8.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Sort features by m/z and binary-search the lower bound (PR smith-chem-wisc#1064 review)
Addresses Nic's efficiency suggestion at FromFileDeconvolutionAlgorithm.cs
line 47 — "Could possibly improve efficiency by storing the features in
an ordered collection within the FromFileParameters."
FromFileDeconvolutionParameters now sorts the per-charge features by
ascending m/z at construction and stores a parallel double[] of m/z
keys. A new internal helper FindFirstIndexAtOrAbove(double minMz) wraps
Array.BinarySearch with the standard "if no match, return insertion
point" convention.
FromFileDeconvolutionAlgorithm.Deconvolute now binary-searches for the
first feature whose m/z is at or above range.Minimum, iterates forward,
and yield-breaks the moment Mz exceeds range.Maximum — turning the
per-MS2 query from O(N) to O(log N + k) where k is the count of
features whose m/z lands inside the requested window.
For a typical search (10k features × ~5k MS2 scans), this is ~400×
fewer per-feature comparisons than the previous linear scan; for
typical k ≈ 1-3 features per isolation window, the constant-factor
improvement is much larger than that.
Behavior-preserving — all 18 FromFile tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Lift PR-touched files to ≥ 90% line coverage with essential tests
Baseline coverage on PR-touched files identified three classes below 90%:
- MzLibUtil.MzRtRange — 34.5% (new class, mostly untested properties / methods)
- MassSpectrometry.Deconvoluter — 83.7% (uncovered: null-range auto-upgrade
branch, FromFile-throws-from-enum-switch defensive path, decoy-null throw
in DeconvoluteWithDecoys)
- Readers.FromFileDeconvolutionParameters — 83.9% (uncovered: null filePath
throw, non-feature-file throw, ToDecoyParameters returns null)
Tests added (essential paths only, no fluff):
New file Test/TestMzRtRange.cs (15 tests):
- Two constructors with bounds verification.
- Derived properties (Minimum/Maximum/MeanMZ/MeanRt/Width pair) in one test.
- Contains: inside, mz-below, mz-above, rt-below, rt-above, inclusive boundaries.
- CompareTo: inside (0), mz-below (1), mz-above (-1), rt-below (1), rt-above (-1).
- ToString smoke check (both m/z and RT bounds appear in output).
Added to TestFromFileDeconvolution.cs (5 tests):
- Null filePath in ctor → ArgumentNullException.
- Non-feature file (.psmtsv) in ctor → MzLibException with diagnostic message.
- ToDecoyParameters → null (locks the "no decoy support" contract).
- Deconvoluter.Deconvolute(MsDataScan, FromFileParams, null) auto-upgrades using
the scan's MassSpectrum.Range (covers the null-range branch of the ternary).
- DeconvoluteWithDecoys with FromFile params → InvalidOperationException (covers
the decoy-null defensive path in DeconvoluteWithDecoys).
- A test-only DeconvolutionParameters stub that claims DeconvolutionType.FromFile
but doesn't override CreateAlgorithm → Deconvoluter.CreateAlgorithm throws
the explanatory MzLibException (covers the FromFile case in the enum switch
defensive path).
Skipped (still 100% but borderline):
- MassSpectrometry.MsDataScan at 92.1% — uncovered lines are pre-existing
code outside this PR's scope.
Result on PR-touched files:
- MzRtRange: 34.5 → 100% line coverage
- FromFileDeconvolutionParameters: 83.9 → 100%
- Deconvoluter: 83.7 → 100%
- All other PR-touched classes already ≥ 97%
- Zero PR-touched classes below 90%
Total: 40 FromFile+MzRtRange tests pass; full Test.csproj suite 3436/3436
pass (5 benchmarks skipped, 2 min 33 s on .NET 8).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
pr to track/copy changes