ggml: uniformize im2col dst_type for all conv ops#23660
Open
Juste-Leo2 wants to merge 1 commit into
Open
Conversation
Contributor
Author
|
CC @pwilkin in case you'd like to take a look :) |
CISC
approved these changes
May 25, 2026
Member
CISC
left a comment
There was a problem hiding this comment.
On what backend did it crash on quantized weights?
Contributor
Author
It crashed on the CPU and Vulkan backends (CUDA was unaffected). I had previously done a temporary workaround (here), but forcing F16 ended up breaking BF16. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR adjusts the im2col output type in all convolution operations that use it. Instead of always forcing F16, we keep F16 only when the weight is F16, and use F32 for everything else (BF16, F32, quantized types).
Additional information
This change was discovered while working on the Zaya model, which uses ggml_conv_1d_grouped (#22833). This operation goes through im2col, and the old code forced F16, which caused precision loss with BF16 weights or even crashes with quantized weights. The ggml_conv2d and ggml_conv3d operations had a similar issue, as they passed the weight type directly to im2col without checking.
Related to PR #23112 (Zaya).
Requirements