How to properly integrate Energon dataloader with Hugging Face Accelerate?

## Problem Description
I'm trying to use Megatron-Energon's dataloader with Hugging Face Accelerate for distributed training, but encountering a tensor concatenation error when using accelerator.prepare() on the dataloader.
### Custom Sample Definition
```python
@dataclass
class CustomSample(Sample):
    cls: torch.Tensor
    image: torch.Tensor
    latent: torch.Tensor
def cook_custom_sample(sample: dict) -> CustomSample:
    return CustomSample(
        **basic_sample_keys(sample),
        cls=torch.tensor(sample["cls"], dtype=torch.long),
        image=sample["image.png"],
        latent=torch.from_numpy(sample["latent.npy"]),
    )
```
### Training Setup
```python
ds = get_train_dataset(
    args.data_dir,
    batch_size=local_batch_size,
    shuffle_buffer_size=100,
    task_encoder=CustomTaskEncoder(),
    max_samples_per_sequence=100,
    worker_config=WorkerConfig.default_worker_config(),
)
train_dataloader = get_savable_loader(ds)
# This line causes the error
model, optimizer, train_dataloader = accelerator.prepare(
    model, optimizer, train_dataloader
)
```
## Error Message

`TypeError: Can only concatenate tensors but got <class 'int'>`

The error occurs in Accelerate's internal concatenation operations when it tries to process the dataloader batches.

Does Energon handle distributed training internally, making Accelerate preparation unnecessary? What's the recommended integration pattern for using both libraries together?

Any guidance on the proper integration pattern would be greatly appreciated!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to properly integrate Energon dataloader with Hugging Face Accelerate? #155

Problem Description

Custom Sample Definition

Training Setup

Error Message

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

How to properly integrate Energon dataloader with Hugging Face Accelerate? #155

Description

Problem Description

Custom Sample Definition

Training Setup

Error Message

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions