Skip to content

Conversation

@sriramsowmithri9807
Copy link

…for pattern-specific quantization

  • Changed line 192 in gemma/peft/_quantization.py from einsum_str=self.wrapped.name to einsum_str=einsum_str
  • This enables pattern-specific quantization axis selection logic in get_axis_to_reduce_from_einsum_str()
  • Previously, module names like 'attention_proj' were passed, always returning None and forcing fallback to generic per-channel scaling
  • Now actual einsum equations like 'BTD,NDH->BTNH' are passed, enabling optimal pattern-specific scaling
  • Improves quantization accuracy for all einsum operations in quantization-aware training workflows

…for pattern-specific quantization

- Changed line 192 in gemma/peft/_quantization.py from einsum_str=self.wrapped.name to einsum_str=einsum_str
- This enables pattern-specific quantization axis selection logic in get_axis_to_reduce_from_einsum_str()
- Previously, module names like 'attention_proj' were passed, always returning None and forcing fallback to generic per-channel scaling
- Now actual einsum equations like 'BTD,NDH->BTNH' are passed, enabling optimal pattern-specific scaling
- Improves quantization accuracy for all einsum operations in quantization-aware training workflows
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant