Skip to content

[BUG] Potential NaN in output due to uninitialized memory when batch_sizes[i] == 0 #36

@LitLeo

Description

@LitLeo

When c is None, the function _allocate_output uses torch.empty to allocate the output tensor:

return torch.empty(*shape, device=a.device, dtype=a.dtype)

However, if any entry in batch_sizes (e.g., batch_sizes[i]) is zero, the corresponding GEMM computation for that expert is skipped, and that region of the output tensor is never written to.

Since torch.empty does not initialize memory, these unwritten regions may contain:

  • Arbitrary garbage values
  • NaNs or infinities
  • Non-deterministic behavior across runs

This can lead to silent correctness issues in MoE (Mixture of Experts) , especially when some experts receive zero tokens during routing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions