[BUG] Potential NaN in output due to uninitialized memory when batch_sizes[i] == 0

When  c is None, the function `_allocate_output` uses `torch.empty` to allocate the output tensor:

```python
return torch.empty(*shape, device=a.device, dtype=a.dtype)
```

However, if any entry in batch_sizes (e.g., batch_sizes[i]) is zero, the corresponding GEMM computation for that expert is skipped, and that region of the output tensor is never written to.

Since torch.empty does not initialize memory, these unwritten regions may contain:
 - Arbitrary garbage values 
 - NaNs or infinities
 - Non-deterministic behavior across runs

This can lead to silent correctness issues in MoE (Mixture of Experts) , especially when some experts receive zero tokens during routing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Potential NaN in output due to uninitialized memory when batch_sizes[i] == 0 #36

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[BUG] Potential NaN in output due to uninitialized memory when batch_sizes[i] == 0 #36

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions