Added custom eval metric feature#84
Merged
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a custom evaluation metric feature to the flexml library, allowing users to define and use their own scoring functions for model evaluation and tuning. The implementation introduces a new CustomScore class to wrap custom metric functions and integrates this functionality across the supervised learning pipeline.
Key changes:
- New
CustomScoreclass to validate and wrap custom metric functions with support for probability vs label predictions and maximize/minimize optimization directions - Extended
start_experiment,tune_model, and related methods to accept callable custom metrics alongside standard string-based metrics - Comprehensive test suite covering regression, binary classification, multiclass classification, and tuning scenarios
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_custom_metrics.py | Comprehensive test suite with 9 test cases covering custom metrics for regression, classification (binary/multiclass), tuning (GridSearch, RandomizedSearch, Optuna), and error handling |
| flexml/structures/custom_score.py | New class to wrap and validate custom scoring functions with parameter validation, sklearn scorer integration, and support for probability/label-based metrics |
| flexml/structures/supervised_base.py | Updates to handle custom metrics in experiment workflow including parameter handling, model selection logic for minimize vs maximize, and leaderboard sorting |
| flexml/helpers/validators.py | Type hint updates to support CustomScore objects in addition to string-based metrics |
| flexml/helpers/supervised_helpers.py | Enhanced evaluate_model_perf to compute custom metrics alongside standard metrics with proper handling of probabilities vs labels |
| flexml/_model_tuner.py | Support for custom metrics in all tuning methods (grid search, randomized search, Optuna) with proper scorer integration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ore start_experiment()
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolved #83