DataLoader.collate now clones cached HData when sampling full hypergraph by SAY-5 · Pull Request #176 · hypernetwork-research-group/hyperbench

SAY-5 · 2026-04-28T21:32:14Z

All Submissions

Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

Description

Closes #173.

DataLoader.collate() returned self.__cached_dataset_hdata.to(batch[0].device) when sample_full_hypergraph=True. Because HData.to() is in-place, this returned the cached dataset object itself. As a result, iterating the dataloader and mutating the batch, or transferring through a different device path on the next iteration, could silently mutate the dataset's cached hdata.

This change introduces a public HData.clone() method that returns a structurally independent HData: tensor fields are cloned, while scalar fields are passed through. The loader now uses clone().to(...) instead of calling to(...) directly on the cached object.

The clone happens once per batch in the sample-full path, so the cost is bounded by dataset size and is the same order of magnitude as the device transfer already performed there.

Test plan:

Three regression tests were added in hyperbench/tests/data/loader_test.py:

test_collate_sample_full_hypergraph_does_not_share_storage_with_cached_hdata
— checks data_ptr inequality across x, hyperedge_index, and hyperedge_attr.
test_collate_sample_full_hypergraph_mutating_batch_does_not_affect_cached_hdata
— mutates the batch in place and confirms the cached hdata tensors are unchanged.
test_collate_sample_full_hypergraph_with_weights_isolates_weights
— verifies the same isolation for hyperedge_weights.

Each test fails when only loader.py and hdata.py are reverted, confirming they exercise the new behavior. The existing test_collate_sample_full_hypergraph_returns_cached_hdata content-equality test continues to pass.

Checklist

Does your submission pass all tests? (use make test)
Have you written tests to cover all your changes? If not, provide a reason.
Have you lint your code locally before submission? (use make lint)
Have you type checked your code locally before submission? (use make typecheck)
Have you added an explanation of what your changes are and why you'd like us to include them?

`DataLoader.collate()` returned `self.__cached_dataset_hdata.to(...)` when `sample_full_hypergraph=True`. Because `HData.to()` is in-place, that returned the cached dataset object itself — so iterating the dataloader and mutating the batch (or transferring through a different device path on the next iteration) silently mutated the dataset's cached `hdata`. This change adds an `HData.clone()` method that returns a structurally independent `HData` (every tensor field cloned, scalar fields passed through), and wires the loader to `clone().to(...)` instead of `to(...)` directly. The clone happens once per batch in the sample-full path, so the cost is bounded by the dataset size — same order of magnitude as the device transfer that already happens there. Three regression tests in `hyperbench/tests/data/loader_test.py`: - `test_collate_sample_full_hypergraph_does_not_share_storage_with_cached_hdata` asserts `data_ptr` inequality across `x`, `hyperedge_index`, `hyperedge_attr`. - `test_collate_sample_full_hypergraph_mutating_batch_does_not_affect_cached_hdata` mutates the batch in place and confirms the cached hdata's tensors are unchanged. - `test_collate_sample_full_hypergraph_with_weights_isolates_weights` exercises the same isolation for `hyperedge_weights`. Each fails when `loader.py` and `hdata.py` are stashed, confirming they exercise the new behaviour. Existing `test_collate_sample_full_hypergraph_returns_cached_hdata` continues to pass — the equality of contents is preserved. Closes hypernetwork-research-group#173

tizianocitro · 2026-04-29T11:43:44Z

I had to make small changes to make it follow repo practices and pass tests.

Given I had to do only little work, I'm approving this one. Please next time, follow the contribution guide listed in the template or README.

github-actions Bot assigned SAY-5 Apr 28, 2026

github-actions Bot added data tests types fix labels Apr 28, 2026

tizianocitro changed the title ~~fix: DataLoader.collate clones cached hdata on sample_full_hypergraph~~ DataLoader.collate now clones cached HData when sampling full hypergraph Apr 29, 2026

refactor: change HData.clone() to follow codebase paractices

bd40257

tizianocitro approved these changes Apr 29, 2026

View reviewed changes

tizianocitro merged commit 819bdf5 into hypernetwork-research-group:main Apr 29, 2026
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataLoader.collate now clones cached HData when sampling full hypergraph#176

DataLoader.collate now clones cached HData when sampling full hypergraph#176
tizianocitro merged 2 commits intohypernetwork-research-group:mainfrom
SAY-5:fix/dataloader-collate-clones-cached-hdata

SAY-5 commented Apr 28, 2026 •

edited by tizianocitro

Loading

Uh oh!

tizianocitro commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SAY-5 commented Apr 28, 2026 • edited by tizianocitro Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

All Submissions

Description

Checklist

Uh oh!

tizianocitro commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SAY-5 commented Apr 28, 2026 •

edited by tizianocitro

Loading