Skip to content

Added custom eval metric feature#84

Merged
ozguraslank merged 8 commits into
mainfrom
custom_eval_score
Jan 3, 2026
Merged

Added custom eval metric feature#84
ozguraslank merged 8 commits into
mainfrom
custom_eval_score

Conversation

@ozguraslank

Copy link
Copy Markdown
Owner

Resolved #83

@ozguraslank ozguraslank requested a review from Copilot January 3, 2026 14:46
@ozguraslank ozguraslank self-assigned this Jan 3, 2026
@ozguraslank ozguraslank added the enhancement New feature or request label Jan 3, 2026
@ozguraslank ozguraslank added this to the 1.1.1 milestone Jan 3, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a custom evaluation metric feature to the flexml library, allowing users to define and use their own scoring functions for model evaluation and tuning. The implementation introduces a new CustomScore class to wrap custom metric functions and integrates this functionality across the supervised learning pipeline.

Key changes:

  • New CustomScore class to validate and wrap custom metric functions with support for probability vs label predictions and maximize/minimize optimization directions
  • Extended start_experiment, tune_model, and related methods to accept callable custom metrics alongside standard string-based metrics
  • Comprehensive test suite covering regression, binary classification, multiclass classification, and tuning scenarios

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 20 comments.

Show a summary per file
File Description
tests/test_custom_metrics.py Comprehensive test suite with 9 test cases covering custom metrics for regression, classification (binary/multiclass), tuning (GridSearch, RandomizedSearch, Optuna), and error handling
flexml/structures/custom_score.py New class to wrap and validate custom scoring functions with parameter validation, sklearn scorer integration, and support for probability/label-based metrics
flexml/structures/supervised_base.py Updates to handle custom metrics in experiment workflow including parameter handling, model selection logic for minimize vs maximize, and leaderboard sorting
flexml/helpers/validators.py Type hint updates to support CustomScore objects in addition to string-based metrics
flexml/helpers/supervised_helpers.py Enhanced evaluate_model_perf to compute custom metrics alongside standard metrics with proper handling of probabilities vs labels
flexml/_model_tuner.py Support for custom metrics in all tuning methods (grid search, randomized search, Optuna) with proper scorer integration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread flexml/structures/supervised_base.py
Comment thread flexml/structures/custom_score.py
Comment thread flexml/helpers/validators.py Outdated
Comment thread flexml/helpers/supervised_helpers.py Outdated
Comment thread flexml/helpers/supervised_helpers.py Outdated
Comment thread flexml/structures/supervised_base.py Outdated
Comment thread flexml/helpers/supervised_helpers.py Outdated
Comment thread flexml/helpers/supervised_helpers.py Outdated
Comment thread flexml/helpers/supervised_helpers.py Outdated
Comment thread flexml/helpers/supervised_helpers.py Outdated
@ozguraslank ozguraslank merged commit af8a4b7 into main Jan 3, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Custom Evaluation Score

2 participants