This is caused by a dtype mismatch between p_cpu.grad and torch.empty_like(grad_view, device='cpu'). Perhaps we should fix it to torch.empty_like(grad_view, dtype=p_cpu.grad.dtype, device='cpu').
File "/home/armin/MegaTrain/infinity/model/cpu_master.py", line 810, in _grad_worker p_cpu.grad = torch.empty_like(grad_view, device='cpu') ^^^^^^^^^^ RuntimeError: attempting to assign a gradient with dtype 'c10::BFloat16' to a tensor with grad_dtype 'Float'. The gradient must match the tensor's grad_dtype (defaults to the tensor's dtype).
This is caused by a dtype mismatch between
p_cpu.gradandtorch.empty_like(grad_view, device='cpu'). Perhaps we should fix it totorch.empty_like(grad_view, dtype=p_cpu.grad.dtype, device='cpu').File "/home/armin/MegaTrain/infinity/model/cpu_master.py", line 810, in _grad_worker p_cpu.grad = torch.empty_like(grad_view, device='cpu') ^^^^^^^^^^ RuntimeError: attempting to assign a gradient with dtype 'c10::BFloat16' to a tensor with grad_dtype 'Float'. The gradient must match the tensor's grad_dtype (defaults to the tensor's dtype).