Problem Description
I'm trying to use Megatron-Energon's dataloader with Hugging Face Accelerate for distributed training, but encountering a tensor concatenation error when using accelerator.prepare() on the dataloader.
Custom Sample Definition
@dataclass
class CustomSample(Sample):
cls: torch.Tensor
image: torch.Tensor
latent: torch.Tensor
def cook_custom_sample(sample: dict) -> CustomSample:
return CustomSample(
**basic_sample_keys(sample),
cls=torch.tensor(sample["cls"], dtype=torch.long),
image=sample["image.png"],
latent=torch.from_numpy(sample["latent.npy"]),
)
Training Setup
ds = get_train_dataset(
args.data_dir,
batch_size=local_batch_size,
shuffle_buffer_size=100,
task_encoder=CustomTaskEncoder(),
max_samples_per_sequence=100,
worker_config=WorkerConfig.default_worker_config(),
)
train_dataloader = get_savable_loader(ds)
# This line causes the error
model, optimizer, train_dataloader = accelerator.prepare(
model, optimizer, train_dataloader
)
Error Message
TypeError: Can only concatenate tensors but got <class 'int'>
The error occurs in Accelerate's internal concatenation operations when it tries to process the dataloader batches.
Does Energon handle distributed training internally, making Accelerate preparation unnecessary? What's the recommended integration pattern for using both libraries together?
Any guidance on the proper integration pattern would be greatly appreciated!
Problem Description
I'm trying to use Megatron-Energon's dataloader with Hugging Face Accelerate for distributed training, but encountering a tensor concatenation error when using accelerator.prepare() on the dataloader.
Custom Sample Definition
Training Setup
Error Message
TypeError: Can only concatenate tensors but got <class 'int'>The error occurs in Accelerate's internal concatenation operations when it tries to process the dataloader batches.
Does Energon handle distributed training internally, making Accelerate preparation unnecessary? What's the recommended integration pattern for using both libraries together?
Any guidance on the proper integration pattern would be greatly appreciated!