[MRG] 🧪 Add MLE-Bench connection setup and usage#303
Merged
Conversation
Contributor
Author
Introduce documentation for MLE-Bench, including installation steps, dataset preparation commands, and grading examples. This enhances usability by providing clear guidance for users setting up and interacting with the benchmark.
- Update CLI commands to use `mle-exp` for consistency with package naming. - Improve logging by ensuring no duplicate handlers and standardizing formatting. - Refactor Git LFS initialization to validate files and handle multiple directories that missed previously.
…ities - Fix CLI and API utilities bug: type Path not working for click.Path. - Improve error handling by adding detailed traceback info and JSONL validation.
3ee112e to
e3e8f29
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new experimental CLI and related infrastructure to integrate the
MLE-Benchbenchmarking tool into the project. It includes updates to documentation, the addition of new CLI and installation scripts, and configuration changes to support the integrationCloses #301
Below are the most important changes grouped by theme:
New CLI and Integration with MLE-Bench
exp/cli.py: A simple experimental CLI script to bridgeMLE-agentandMLE-Bench, allowing users to install, prepare datasets, run benchmarks, and grade submissions. It proxies commands to the externalmlebenchtool.exp/init.py: A script to download files and install theMLE-Benchrepository using Git LFS and copy it into the installedmlebenchpackage's folder.exp/mlebench_api.py: An API rewrite wrapper over mle-bench functions for downloading datasets and grading assessment.Documentation Updates
exp/README.md: Added detailed instructions for installing and usingMLE-Bench, including commands for preparing datasets and grading submissions.Dependency and Configuration Changes
pyproject.toml:pipandsetuptoolsas dependencies to ensure compatibility.benchfor installingmlebenchdirectly from its GitHub repository.[tool.uv]section to override dependencies, specifically skippingtensorflow-io-gcs-filesystemon Windows.What has been done to verify that this works as intended?
Belowed functions are tested on Win11 CP311 and Ubuntu 22.04 (WSL) CP311
Install and init
Expected output:
Prepare datasets
Grading one submission
Expected output:
Grading multiple submissions
where your test multi-submission JSONL file has content:
{"competition_id": "tabular-playground-series-may-2022", "submission_path": "C:\\Users\\li_yu\\PycharmProjects\\mle-bench-solver\\submission.csv"}Expected output:
And save into a file
<timestamp_grading_report.jsonWhy is this the best possible solution? Were any other approaches considered?
N.A.
How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?
Not affected if do not install the extra package
Do we need any specific form for testing your changes? If so, please attach one.
See above
Does this change require updates to documentation? If so, please file an issue here and include the link below.
Before submitting this PR, please make sure you have:
the credit file.