Possible redundant norm computation in MuonHyperball

As described by MuonH paper, the parameter's norm is normalized to its F-norm at first step, so this value should be computed at most once.

The current implementation:

https://github.com/NVIDIA-NeMo/Emerging-Optimizers/blob/3b6c5fb6b493e325425e1dea4df0231c2859e09f/emerging_optimizers/orthogonalized_optimizers/muon_hyperball.py#L84-L99

Computes `p.norm` in each optimizer call, which leads to redundancy. My suggestion is something like:

```python
    @override
    def pre_weight_update_fn_inplace(self, p: torch.Tensor, update: torch.Tensor) -> None:
        if "hyperball_R" not in self.state:
            R = self.hyperball_radius if self.hyperball_radius is not None else p.norm().item()
            self.state[p]["hyperball_R"] = R
        else:
            R = self.state[p]["hyperball_R"]
        update_norm = update.norm().clamp_min(self.hyperball_eps)
        update.mul_(R / update_norm)
```

	@override
	def pre_weight_update_fn_inplace(self, p: torch.Tensor, update: torch.Tensor) -> None:
	"""Store the original weight norm and normalize the update using Frobenius norm.

	Args:
	p: The parameter tensor.
	update: The orthogonalized gradient tensor.
	"""
	# Use user-specified radius or compute R = \|\|W_t\|\|_F (Frobenius norm)
	R = self.hyperball_radius if self.hyperball_radius is not None else p.norm().item()
	self.state[p]["hyperball_R"] = R

	# Normalize the update in-place and scale by R
	# This modifies update to be: R * normalize(update) using Frobenius norm.
	update_norm = update.norm().clamp_min(self.hyperball_eps)
	update.mul_(R / update_norm)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible redundant norm computation in MuonHyperball #155

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Possible redundant norm computation in MuonHyperball #155

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions