Skip to content

ggml-hexagon: add Q4_1 support#60

Open
max-krasnyansky wants to merge 4 commits into
masterfrom
jules-12614708629588356175-17769d61
Open

ggml-hexagon: add Q4_1 support#60
max-krasnyansky wants to merge 4 commits into
masterfrom
jules-12614708629588356175-17769d61

Conversation

@max-krasnyansky
Copy link
Copy Markdown
Owner

This PR adds support for Q4_1 data type for MUL_MAT operations in the Hexagon backend.

Changes:

  • Added HTP_TYPE_Q4_1 and HTP_TYPE_Q8_1 mappings and their x4x2 constants.
  • Handled tensor buffering logic (get_alloc_size) to accommodate the QK_Q4_1x4x2 (160 bytes) and QK_Q8_1x4x2 block dimensions.
  • Added repack_q4_1_q4x4x2 and repack_q8_1_q8x4x2 pack/unpack functions to ggml-hexagon.cpp, modifying the buffer assignment operations (set_tensor and get_tensor) accordingly.
  • Introduced dynamic quantize rows logic quantize_row_f32_q8_1x4x2 for FP32 activations in matmul-ops.c, taking into account offset calculations.
  • Integrated HVX vector dot kernels vec_dot_q4_1x4x2_q8_1x4x2_1x1, 2x1, and 2x2, utilizing the newly created minimum layout offsets format.
  • Mapped Q4_1 type into the supported types matrix in ggml_hexagon_supported_mul_mat and ggml_hexagon_supported_mul_mat_id.
  • Configured HMX support for HTP_TYPE_Q4_1, handling its corresponding minimum offset logic directly within the dequantization pass hmx-matmul-ops.c.

PR created automatically by Jules for task 12614708629588356175 started by @max-krasnyansky

Co-authored-by: max-krasnyansky <1380796+max-krasnyansky@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

google-labs-jules Bot and others added 3 commits May 23, 2026 21:56
Co-authored-by: max-krasnyansky <1380796+max-krasnyansky@users.noreply.github.com>
Co-authored-by: max-krasnyansky <1380796+max-krasnyansky@users.noreply.github.com>
Co-authored-by: max-krasnyansky <1380796+max-krasnyansky@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant