Skip to content

Integrate ZScoreNNClassifier into search pipeline#799

Open
GeorgWa wants to merge 5 commits intofeature/zscore-nn-classifierfrom
feature/zscore-nn-classifier-integration
Open

Integrate ZScoreNNClassifier into search pipeline#799
GeorgWa wants to merge 5 commits intofeature/zscore-nn-classifierfrom
feature/zscore-nn-classifier-integration

Conversation

@GeorgWa
Copy link
Collaborator

@GeorgWa GeorgWa commented Feb 27, 2026

Summary

  • Wire ZScoreNNClassifier into the FDR manager and peptide-centric workflow
  • Add fdr.zscore_nn_classifier config section to default.yaml
  • Route classifier selection through FDRManager based on config

Stacked on #798.

🤖 Generated with Claude Code


PR Stack

Move ZScoreNNClassifier from scripts/ into alphadia/fdr/ as a drop-in
replacement for BinaryClassifierLegacyNewBatching. Add classifier_type
config option ("nn" or "zscore_nn") to fdr section of default.yaml.

When zscore_nn is selected, the two-stage classifier pre-filters
candidates by z-score at 50% FDR before training the NN on survivors
only. This reduces NN training time from ~312s to ~30s on the final
13M-candidate batch while maintaining equivalent precursor counts.

Changes:
- alphadia/fdr/zscore_nn_classifier.py: classifier from scripts/
- alphadia/constants/default.yaml: add fdr.classifier_type default
- alphadia/fdr/fdr.py: call set_available_columns before fit
- alphadia/workflow/peptidecentric/peptidecentric.py: support classifier
  selection, add rank to feature columns for zscore_nn
- alphadia/workflow/managers/fdr_manager.py: handle incompatible stored
  classifiers gracefully
- plans/rust_optimization_priorities.md: profiling-based optimization plan

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
_MIN_STD = 1e-10


def _find_score_threshold(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be a @staticmethod of ZScoreNNClassifier?

zscore_features : list[str]
Feature names for z-score pre-filter.
available_columns : list[str]
All feature column names including 'rank'. Set by perform_fdr.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class does not know about perform_fdr

Change to All feature column names. Must include 'rank', otherwise ValueError is raised when calling fit().

"available_columns must be set before fit/predict. "
"Pass it via constructor or set_available_columns()."
)
col_idx = {c: i for i, c in enumerate(self._available_columns)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if "rank" not in self._available_columns :
  raise ValueError("...")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, add "rank" yourself here if not passed ..

Comment on lines +85 to +94
Parameters
----------
zscore_features : list[str] | None
Feature names for z-score pre-filter. Defaults to ZSCORE_FEATURES.
available_columns : list[str] | None
All feature column names including 'rank'.
zscore_fdr_threshold : float
FDR threshold for z-score filter.
**nn_kwargs
Keyword arguments forwarded to BinaryClassifierLegacyNewBatching.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove duplicated Parameter definition (cf. l.62ff)

X, y, test_size=0.2, random_state=random_state
)

if hasattr(classifier, "set_available_columns"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could set_available_columns be added to the Classifier base class as a no-op method to avoid the if check here?

Comment on lines +132 to +133
if classifier_type == "zscore_nn" and "rank" not in fdr_feature_columns:
fdr_feature_columns = [*fdr_feature_columns, "rank"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coming back to my other comment: consider moving this to the new classifier

config_fdr = self.config["fdr"]
self._fdr_manager = FDRManager(
feature_columns=get_feature_names()
classifier_type = config_fdr.get("classifier_type", "nn")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config_fdr["classifier_type"]

batch_size=5000,
learning_rate=0.001,
epochs=10,
experimental_hyperparameter_tuning=enable_nn_hyperparameter_tuning,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please pass random_state=random_state,, and pass it on to the BinaryClassifierLegacyNewBatching in there

zscore_features: list[str] | None = None,
available_columns: list[str] | None = None,
zscore_fdr_threshold: float = ZSCORE_FDR_THRESHOLD,
**nn_kwargs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not super happy about that, but fine ..
could you then please remove the **kwargs parameter in BinaryClassifierLegacyNewBatching? this way, it will be made transparent if nonexisting parameters are passed

Comment on lines +323 to +326
logger.warning(
f"Skipping incompatible stored classifier {file}"
)
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just a patch or will that stay? do we need to updated the classifier what comes with alphadia?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants