Integrate ZScoreNNClassifier into search pipeline#799
Integrate ZScoreNNClassifier into search pipeline#799GeorgWa wants to merge 5 commits intofeature/zscore-nn-classifierfrom
Conversation
Move ZScoreNNClassifier from scripts/ into alphadia/fdr/ as a drop-in
replacement for BinaryClassifierLegacyNewBatching. Add classifier_type
config option ("nn" or "zscore_nn") to fdr section of default.yaml.
When zscore_nn is selected, the two-stage classifier pre-filters
candidates by z-score at 50% FDR before training the NN on survivors
only. This reduces NN training time from ~312s to ~30s on the final
13M-candidate batch while maintaining equivalent precursor counts.
Changes:
- alphadia/fdr/zscore_nn_classifier.py: classifier from scripts/
- alphadia/constants/default.yaml: add fdr.classifier_type default
- alphadia/fdr/fdr.py: call set_available_columns before fit
- alphadia/workflow/peptidecentric/peptidecentric.py: support classifier
selection, add rank to feature columns for zscore_nn
- alphadia/workflow/managers/fdr_manager.py: handle incompatible stored
classifiers gracefully
- plans/rust_optimization_priorities.md: profiling-based optimization plan
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| _MIN_STD = 1e-10 | ||
|
|
||
|
|
||
| def _find_score_threshold( |
There was a problem hiding this comment.
could be a @staticmethod of ZScoreNNClassifier?
| zscore_features : list[str] | ||
| Feature names for z-score pre-filter. | ||
| available_columns : list[str] | ||
| All feature column names including 'rank'. Set by perform_fdr. |
There was a problem hiding this comment.
this class does not know about perform_fdr
Change to All feature column names. Must include 'rank', otherwise ValueError is raised when calling fit().
| "available_columns must be set before fit/predict. " | ||
| "Pass it via constructor or set_available_columns()." | ||
| ) | ||
| col_idx = {c: i for i, c in enumerate(self._available_columns)} |
There was a problem hiding this comment.
if "rank" not in self._available_columns :
raise ValueError("...")
There was a problem hiding this comment.
alternatively, add "rank" yourself here if not passed ..
| Parameters | ||
| ---------- | ||
| zscore_features : list[str] | None | ||
| Feature names for z-score pre-filter. Defaults to ZSCORE_FEATURES. | ||
| available_columns : list[str] | None | ||
| All feature column names including 'rank'. | ||
| zscore_fdr_threshold : float | ||
| FDR threshold for z-score filter. | ||
| **nn_kwargs | ||
| Keyword arguments forwarded to BinaryClassifierLegacyNewBatching. |
There was a problem hiding this comment.
please remove duplicated Parameter definition (cf. l.62ff)
| X, y, test_size=0.2, random_state=random_state | ||
| ) | ||
|
|
||
| if hasattr(classifier, "set_available_columns"): |
There was a problem hiding this comment.
could set_available_columns be added to the Classifier base class as a no-op method to avoid the if check here?
| if classifier_type == "zscore_nn" and "rank" not in fdr_feature_columns: | ||
| fdr_feature_columns = [*fdr_feature_columns, "rank"] |
There was a problem hiding this comment.
coming back to my other comment: consider moving this to the new classifier
| config_fdr = self.config["fdr"] | ||
| self._fdr_manager = FDRManager( | ||
| feature_columns=get_feature_names() | ||
| classifier_type = config_fdr.get("classifier_type", "nn") |
There was a problem hiding this comment.
config_fdr["classifier_type"]
| batch_size=5000, | ||
| learning_rate=0.001, | ||
| epochs=10, | ||
| experimental_hyperparameter_tuning=enable_nn_hyperparameter_tuning, |
There was a problem hiding this comment.
please pass random_state=random_state,, and pass it on to the BinaryClassifierLegacyNewBatching in there
| zscore_features: list[str] | None = None, | ||
| available_columns: list[str] | None = None, | ||
| zscore_fdr_threshold: float = ZSCORE_FDR_THRESHOLD, | ||
| **nn_kwargs, |
There was a problem hiding this comment.
not super happy about that, but fine ..
could you then please remove the **kwargs parameter in BinaryClassifierLegacyNewBatching? this way, it will be made transparent if nonexisting parameters are passed
| logger.warning( | ||
| f"Skipping incompatible stored classifier {file}" | ||
| ) | ||
| continue |
There was a problem hiding this comment.
is this just a patch or will that stay? do we need to updated the classifier what comes with alphadia?
Summary
ZScoreNNClassifierinto the FDR manager and peptide-centric workflowfdr.zscore_nn_classifierconfig section todefault.yamlFDRManagerbased on configStacked on #798.
🤖 Generated with Claude Code
PR Stack