Skip to content

fix: resolve dataset preparation failure caused by upstream framework update#550

Open
Fahmid-Arman wants to merge 1 commit into
kubeedge:mainfrom
Fahmid-Arman:fix-issue-164-dataset-prep
Open

fix: resolve dataset preparation failure caused by upstream framework update#550
Fahmid-Arman wants to merge 1 commit into
kubeedge:mainfrom
Fahmid-Arman:fix-issue-164-dataset-prep

Conversation

@Fahmid-Arman

Copy link
Copy Markdown

What type of PR is this?
/kind bug

What this PR does / why we need it:
This PR resolves a backward-compatibility break in the dataset preparation logic that caused benchmarkingjob failures across multiple legacy examples.

  • Root Cause: A prior framework refactor in core/testenvmanager/dataset/dataset.py introduced new configuration keys (train_index, train_data, train_data_info) and required process_dataset() to use them. Legacy examples (e.g., Cloud-Robotics) still use train_url and test_url in their testenv.yaml configurations. This mismatch caused a NotImplementedError("not one of train_index/train_data/train_data_info") exception.
  • Fix Applied: Implemented a backward-compatibility shim within the _parse_config method of core/testenvmanager/dataset/dataset.py. If train_url or test_url is detected and the new keys are unset, the values are automatically mapped to train_index and test_index. This restores full functionality for all legacy examples without requiring repository-wide YAML updates and maintains precedence for new configuration structures.

Which issue(s) this PR fixes:
Fixes #164

@kubeedge-bot kubeedge-bot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 15, 2026
@kubeedge-bot

Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Fahmid-Arman
To complete the pull request process, please assign jaypume after the PR has been reviewed.
You can assign the PR to them by writing /assign @jaypume in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot requested review from Poorunga and hsj576 June 15, 2026 07:05
@kubeedge-bot kubeedge-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 15, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces backward compatibility mapping for legacy dataset configurations that use train_url and test_url instead of the newer train_index/train_data/train_data_info fields. However, unconditionally mapping these legacy URLs to train_index and test_index can cause runtime failures if the URLs point to other formats like .jsonl or metadata files. The reviewer suggests inspecting the file format of the legacy URLs and mapping them to the appropriate new fields (train_index, train_data, or train_data_info) accordingly.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread core/testenvmanager/dataset/dataset.py Outdated
… update

Signed-off-by: Fahmid Arman <fahmid.brac@gmail.com>
@Fahmid-Arman Fahmid-Arman force-pushed the fix-issue-164-dataset-prep branch from bcfbf92 to b946de6 Compare June 15, 2026 07:10
@kubeedge-bot kubeedge-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 15, 2026
@Fahmid-Arman

Copy link
Copy Markdown
Author

Feedback addressed:

  • Implemented format-aware routing for legacy train_url and test_url using utils.get_file_format to prevent runtime failures with .jsonl and metadata files.

/assign @jaypume

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue with preparing dataset failed:The dataset preparation failure caused by the Ianvs project update

3 participants