Integrate ZScoreNNClassifier into search pipeline by GeorgWa · Pull Request #799 · MannLabs/alphadia

GeorgWa · 2026-02-27T22:47:31Z

Summary

Wire ZScoreNNClassifier into the FDR manager and peptide-centric workflow
Add fdr.zscore_nn_classifier config section to default.yaml
Route classifier selection through FDRManager based on config

Stacked on #798.

🤖 Generated with Claude Code

PR Stack

alphadia-search-rs: #118 → #119
alphadia: #798 → #799 → #800

Move ZScoreNNClassifier from scripts/ into alphadia/fdr/ as a drop-in replacement for BinaryClassifierLegacyNewBatching. Add classifier_type config option ("nn" or "zscore_nn") to fdr section of default.yaml. When zscore_nn is selected, the two-stage classifier pre-filters candidates by z-score at 50% FDR before training the NN on survivors only. This reduces NN training time from ~312s to ~30s on the final 13M-candidate batch while maintaining equivalent precursor counts. Changes: - alphadia/fdr/zscore_nn_classifier.py: classifier from scripts/ - alphadia/constants/default.yaml: add fdr.classifier_type default - alphadia/fdr/fdr.py: call set_available_columns before fit - alphadia/workflow/peptidecentric/peptidecentric.py: support classifier selection, add rank to feature columns for zscore_nn - alphadia/workflow/managers/fdr_manager.py: handle incompatible stored classifiers gracefully - plans/rust_optimization_priorities.md: profiling-based optimization plan Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mschwoer · 2026-03-03T09:48:11Z

alphadia/fdr/zscore_nn_classifier.py

+_MIN_STD = 1e-10
+
+
+def _find_score_threshold(


could be a @staticmethod of ZScoreNNClassifier?

mschwoer · 2026-03-03T09:49:13Z

alphadia/fdr/zscore_nn_classifier.py

+    zscore_features : list[str]
+        Feature names for z-score pre-filter.
+    available_columns : list[str]
+        All feature column names including 'rank'. Set by perform_fdr.


this class does not know about perform_fdr

Change to All feature column names. Must include 'rank', otherwise ValueError is raised when calling fit().

mschwoer · 2026-03-03T09:53:40Z

alphadia/fdr/zscore_nn_classifier.py

+                "available_columns must be set before fit/predict. "
+                "Pass it via constructor or set_available_columns()."
+            )
+        col_idx = {c: i for i, c in enumerate(self._available_columns)}


if "rank" not in self._available_columns : raise ValueError("...")

alternatively, add "rank" yourself here if not passed ..

mschwoer · 2026-03-03T09:55:07Z

alphadia/fdr/zscore_nn_classifier.py

+        Parameters
+        ----------
+        zscore_features : list[str] | None
+            Feature names for z-score pre-filter. Defaults to ZSCORE_FEATURES.
+        available_columns : list[str] | None
+            All feature column names including 'rank'.
+        zscore_fdr_threshold : float
+            FDR threshold for z-score filter.
+        **nn_kwargs
+            Keyword arguments forwarded to BinaryClassifierLegacyNewBatching.


please remove duplicated Parameter definition (cf. l.62ff)

mschwoer · 2026-03-03T09:58:49Z

alphadia/fdr/fdr.py

        X, y, test_size=0.2, random_state=random_state
    )

+    if hasattr(classifier, "set_available_columns"):


could set_available_columns be added to the Classifier base class as a no-op method to avoid the if check here?

mschwoer · 2026-03-03T10:08:42Z

alphadia/workflow/peptidecentric/peptidecentric.py

+        if classifier_type == "zscore_nn" and "rank" not in fdr_feature_columns:
+            fdr_feature_columns = [*fdr_feature_columns, "rank"]


coming back to my other comment: consider moving this to the new classifier

mschwoer · 2026-03-03T10:09:10Z

alphadia/workflow/peptidecentric/peptidecentric.py

        config_fdr = self.config["fdr"]
-        self._fdr_manager = FDRManager(
-            feature_columns=get_feature_names()
+        classifier_type = config_fdr.get("classifier_type", "nn")


config_fdr["classifier_type"]

mschwoer · 2026-03-03T10:10:33Z

alphadia/workflow/peptidecentric/peptidecentric.py

+            batch_size=5000,
+            learning_rate=0.001,
+            epochs=10,
+            experimental_hyperparameter_tuning=enable_nn_hyperparameter_tuning,


please pass random_state=random_state,, and pass it on to the BinaryClassifierLegacyNewBatching in there

mschwoer · 2026-03-03T10:12:56Z

alphadia/fdr/zscore_nn_classifier.py

+        zscore_features: list[str] | None = None,
+        available_columns: list[str] | None = None,
+        zscore_fdr_threshold: float = ZSCORE_FDR_THRESHOLD,
+        **nn_kwargs,


not super happy about that, but fine ..
could you then please remove the **kwargs parameter in BinaryClassifierLegacyNewBatching? this way, it will be made transparent if nonexisting parameters are passed

mschwoer · 2026-03-03T10:19:38Z

alphadia/workflow/managers/fdr_manager.py

+                        logger.warning(
+                            f"Skipping incompatible stored classifier {file}"
+                        )
+                        continue


is this just a patch or will that stay? do we need to updated the classifier what comes with alphadia?

mschwoer reviewed Mar 3, 2026

View reviewed changes

mschwoer added 4 commits March 4, 2026 11:48

Add minimum survivors check in ZScoreNNClassifier pre-filter

9014992

avoid repeated cold start

1419a5c

avoid repeated cold start

194a5b3

disable min_survivors

8d01827

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate ZScoreNNClassifier into search pipeline#799

Integrate ZScoreNNClassifier into search pipeline#799
GeorgWa wants to merge 5 commits intofeature/zscore-nn-classifierfrom
feature/zscore-nn-classifier-integration

GeorgWa commented Feb 27, 2026 •

edited

Loading

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if classifier_type == "zscore_nn" and "rank" not in fdr_feature_columns:
		fdr_feature_columns = [*fdr_feature_columns, "rank"]

		_MIN_STD = 1e-10


		def _find_score_threshold(

Conversation

GeorgWa commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

PR Stack

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GeorgWa commented Feb 27, 2026 •

edited

Loading