Skip to content

transpose forcing data in WeatherDataset __getitem__#556

Open
Anushka1324 wants to merge 4 commits intomllam:mainfrom
Anushka1324:fixing_issue_536
Open

transpose forcing data in WeatherDataset __getitem__#556
Anushka1324 wants to merge 4 commits intomllam:mainfrom
Anushka1324:fixing_issue_536

Conversation

@Anushka1324
Copy link
Copy Markdown

@Anushka1324 Anushka1324 commented Mar 31, 2026

Describe your changes

This PR addresses a bug where raw Xarray DataArrays were converted to PyTorch tensors without enforcing a consistent dimension order in the WeatherDataset class.

Changes:

  • Updated the getitem method to explicitly transpose state and forcing DataArrays to the expected (time, grid_index, feature) dimension order before tensor conversion.

  • Added a conditional check to safely handle cases where forcing data is absent, preventing a ValueError during transposition when the forcing_feature_windowed dimension does not exist.

Closes #536

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the README to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, fixing feature coordination handling in weather dataset
  • I have requested a reviewer and an assignee (assignee is responsible for merging). This applies only if you have write access to the repo, otherwise feel free to tag a maintainer to add a reviewer and assignee.

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug
    • maintenance: when your contribution is relates to repo maintenance, e.g. CI/CD or documentation

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • (if the PR is not just maintenance/bugfix) the PR is assigned to the next milestone. If it is not, propose it for a future milestone.
  • author has added an entry to the changelog (and designated the change as added, changed, fixed or maintenance)
  • Once the PR is ready to be merged, squash commits and merge the PR.

@2024itb047samata
Copy link
Copy Markdown

Hi! I see there’s already an active PR (#556) addressing this.

I’d like to contribute by improving the robustness of the solution:

  • adding explicit validation for coordinate existence
  • making dimension ordering fully dynamic instead of hardcoded
  • improving error messages for debugging

Let me know if that would be helpful — happy to extend the current implementation.

@sadamov sadamov added the bug Something isn't working label Apr 13, 2026
@sadamov sadamov self-assigned this Apr 13, 2026
@sadamov sadamov self-requested a review April 17, 2026 19:31
Copy link
Copy Markdown
Collaborator

@sadamov sadamov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The create_dataarray_from_tensor fix duplicates #309, which addresses the same hardcoded .state_feature bug.

The one non-duplicate contribution is the transpose in __getitem__, but it contains a bug: please scope the PR title and description to that change only if you want to continue.

Comment on lines +520 to +522
da_forcing_windowed = da_forcing_windowed.transpose(
"time", "grid_index", "forcing_feature_windowed"
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When self.da_forcing is None, _build_item_dataarrays produces an empty DataArray with dim "forcing_feature" (line 463). The transpose here requests "forcing_feature_windowed", which only exists after the .stack() on line 455 (i.e. when forcing data is present). xarray will raise ValueError on the no-forcing path.

Suggested change
da_forcing_windowed = da_forcing_windowed.transpose(
"time", "grid_index", "forcing_feature_windowed"
)
if "forcing_feature_windowed" in da_forcing_windowed.dims:
da_forcing_windowed = da_forcing_windowed.transpose(
"time", "grid_index", "forcing_feature_windowed"
)
else:
da_forcing_windowed = da_forcing_windowed.transpose(
"time", "grid_index", "forcing_feature"
)

Copy link
Copy Markdown
Author

@Anushka1324 Anushka1324 Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes done for bug fix and the PR title and the description updated. The changes of the PR #309 accomodated as it is no duplicate changes done. The commit : d5fca7a

@Anushka1324 Anushka1324 changed the title fixing feature coordination handling in weather dataset transpose forcing data in WeatherDataset __getitem__ Apr 23, 2026
@sadamov
Copy link
Copy Markdown
Collaborator

sadamov commented Apr 23, 2026

@Anushka1324 please fix the pre-commits

@Anushka1324
Copy link
Copy Markdown
Author

@Anushka1324 please fix the pre-commits

done

@sadamov
Copy link
Copy Markdown
Collaborator

sadamov commented Apr 23, 2026

flake8...................................................................Failed
- hook id: flake8
- exit code: 1

neural_lam/weather_dataset.py:515:81: E501 line too long (88 > 80 characters)
neural_lam/weather_dataset.py:523:81: E501 line too long (82 > 80 characters)

Copy link
Copy Markdown
Collaborator

@sadamov sadamov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHANGELOG entry missing.

Comment thread neural_lam/weather_dataset.py Outdated
Comment thread neural_lam/weather_dataset.py Outdated
Comment on lines +522 to +530
# For forcing feature dimension was renamed in _build_item_dataarrays
if "forcing_feature_windowed" in da_forcing_windowed.dims:
da_forcing_windowed = da_forcing_windowed.transpose(
"time", "grid_index", "forcing_feature_windowed"
)
else:
da_forcing_windowed = da_forcing_windowed.transpose(
"time", "grid_index", "forcing_feature"
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if/else on dim name is fragile — a name change silently falls through to the wrong branch. xarray ... handles both the windowed and empty-forcing cases:

Suggested change
# For forcing feature dimension was renamed in _build_item_dataarrays
if "forcing_feature_windowed" in da_forcing_windowed.dims:
da_forcing_windowed = da_forcing_windowed.transpose(
"time", "grid_index", "forcing_feature_windowed"
)
else:
da_forcing_windowed = da_forcing_windowed.transpose(
"time", "grid_index", "forcing_feature"
)
da_forcing_windowed = da_forcing_windowed.transpose(
"time", "grid_index", ...
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect Feature Coordinate Handling in WeatherDataset

3 participants