Enable FP16 activations in MatMulNBits by qti-mattsinc · Pull Request #341 · onnxruntime/onnxruntime-qnn

qti-mattsinc · 2026-05-05T22:46:29Z

Description

Remove FP32 input restriction in MatMulNBits op builder. Note that the scales initializer must be cast to FP32 in the op builder as QNN currently requires FP32 scales at the API level.
Add FP16 MatMulNBits unit tests.

Motivation and Context

Enable w4a16 LLMs on the GPU for faster inferencing.

### Description * Remove FP32 input restriction in MatMulNBits op builder. Note that the scales initializer must be cast to FP32 in the op builder as QNN currently requires FP32 scales at the API level. * Add FP16 MatMulNBits unit tests. ### Motivation and Context * Enable w4a16 LLMs on the GPU for faster inferencing.

minfhong-qti

A gentle heads up I'm working on supporting MatMulNBits for HTP in PR #288. There will be a slight refactory. Feel free to review that PR.

minfhong-qti · 2026-05-06T01:26:49Z

+      const OrtTypeInfo* type_info = nullptr;
+      const auto& ort_api = qnn_model_wrapper.GetOrtApi();
+      ORT_CXX_RETURN_ON_API_FAIL(ort_api.GetValueInfoTypeInfo(scale_tensor_proto, &type_info));
+      const OrtTensorTypeAndShapeInfo* tensor_type_and_shape_info = nullptr;
+      ORT_CXX_RETURN_ON_API_FAIL(ort_api.CastTypeInfoToTensorInfo(type_info, &tensor_type_and_shape_info));
+      ONNXTensorElementDataType onnx_data_type = ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED;
+      ORT_CXX_RETURN_ON_API_FAIL(ort_api.GetTensorElementType(tensor_type_and_shape_info, &onnx_data_type));
+
+      RETURN_IF(onnx_data_type != ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT &&
+                    onnx_data_type != ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT16,
+                "Unsupported scales datatype");


Suggested change

const OrtTypeInfo* type_info = nullptr;

const auto& ort_api = qnn_model_wrapper.GetOrtApi();

ORT_CXX_RETURN_ON_API_FAIL(ort_api.GetValueInfoTypeInfo(scale_tensor_proto, &type_info));

const OrtTensorTypeAndShapeInfo* tensor_type_and_shape_info = nullptr;

ORT_CXX_RETURN_ON_API_FAIL(ort_api.CastTypeInfoToTensorInfo(type_info, &tensor_type_and_shape_info));

ONNXTensorElementDataType onnx_data_type = ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED;

ORT_CXX_RETURN_ON_API_FAIL(ort_api.GetTensorElementType(tensor_type_and_shape_info, &onnx_data_type));

RETURN_IF(onnx_data_type != ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT &&

onnx_data_type != ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT16,

"Unsupported scales datatype");

RETURN_IF(scales_tensor.type != ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT &&

scale_tensor.type != ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT16,

"Unsupported scales datatype");

qti-mattsinc · 2026-05-18T21:58:45Z

Closing; this change was brought into a related PR: #288.

minfhong-qti reviewed May 6, 2026

View reviewed changes

qti-mattsinc closed this May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable FP16 activations in MatMulNBits#341

Enable FP16 activations in MatMulNBits#341
qti-mattsinc wants to merge 1 commit into
mainfrom
dev/mattsinc/matmulnbits-fp16

qti-mattsinc commented May 5, 2026

Uh oh!

minfhong-qti left a comment

Uh oh!

minfhong-qti May 6, 2026

Uh oh!

Uh oh!

Uh oh!

qti-mattsinc commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qti-mattsinc commented May 5, 2026

Description

Motivation and Context

Uh oh!

minfhong-qti left a comment

Choose a reason for hiding this comment

Uh oh!

minfhong-qti May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

qti-mattsinc commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants