fix: resolve dataset preparation failure caused by upstream framework update#550
fix: resolve dataset preparation failure caused by upstream framework update#550Fahmid-Arman wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Fahmid-Arman The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request introduces backward compatibility mapping for legacy dataset configurations that use train_url and test_url instead of the newer train_index/train_data/train_data_info fields. However, unconditionally mapping these legacy URLs to train_index and test_index can cause runtime failures if the URLs point to other formats like .jsonl or metadata files. The reviewer suggests inspecting the file format of the legacy URLs and mapping them to the appropriate new fields (train_index, train_data, or train_data_info) accordingly.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
… update Signed-off-by: Fahmid Arman <fahmid.brac@gmail.com>
bcfbf92 to
b946de6
Compare
|
Feedback addressed:
/assign @jaypume |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR resolves a backward-compatibility break in the dataset preparation logic that caused
benchmarkingjobfailures across multiple legacy examples.core/testenvmanager/dataset/dataset.pyintroduced new configuration keys (train_index,train_data,train_data_info) and requiredprocess_dataset()to use them. Legacy examples (e.g., Cloud-Robotics) still usetrain_urlandtest_urlin theirtestenv.yamlconfigurations. This mismatch caused aNotImplementedError("not one of train_index/train_data/train_data_info")exception._parse_configmethod ofcore/testenvmanager/dataset/dataset.py. Iftrain_urlortest_urlis detected and the new keys are unset, the values are automatically mapped totrain_indexandtest_index. This restores full functionality for all legacy examples without requiring repository-wide YAML updates and maintains precedence for new configuration structures.Which issue(s) this PR fixes:
Fixes #164