-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Add missing MatMulInteger int8 x uint8 support #26744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add missing MatMulInteger int8 x uint8 support #26744
Conversation
|
@yuslepukhin, @yufenglee : could you please review the PR? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for the MatMulInteger operator when input A is int8 and input B is uint8, completing the ONNX specification compliance. The ONNX spec allows all four combinations of (int8, uint8) types for the two inputs, but the (int8, uint8) combination was previously missing, causing runtime errors.
- Updated the kernel registration to allow
uint8orint8for T2 when T1 isint8 - Added comprehensive unit tests covering both 2D and N-D cases with per-column zero points
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| onnxruntime/core/providers/cpu/quantization/matmul_integer.cc | Modified the int8_t typed kernel registration to accept both uint8_t and int8_t for the T2 type constraint, enabling the missing int8 x uint8 input combination |
| onnxruntime/test/providers/cpu/math/matmul_integer_test.cc | Added two new test cases (MatMulInteger_int8_uint8_2D and MatMulInteger_int8_uint8_PerColumn_ND) to verify the int8 x uint8 combination works correctly in both simple 2D and complex N-D scenarios with per-column zero points |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows OpenVINO CI Pipeline, Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
Well, there is a test failure:
|
|
@yuslepukhin, I updated the new int8 x uint8 per-column test to be skipped when the DmlExecutionProvider is present, since it hits the same existing DirectML AbiCustomRegistry issue as the existing MatMulInteger_PerColumn_ND test (see the TODO comment there). The test still runs on other execution providers, so it continues to validate the new int8 x uint8 path without being blocked by the known DML issue. |
Description
This PR adds support for the
MatMulIntegeroperator wheninput
Aisint8and inputBisuint8, and adds unit teststo cover this type combination.
According to the ONNX specification for
MatMulInteger, the typeconstraints are:
T1 ∈ {int8, uint8}T2 ∈ {int8, uint8}T3 = int32This means all four combinations
(T1, T2) = (int8,int8), (int8,uint8), (uint8,int8), (uint8,uint8)are valid. However, the implementationwas missing the
(int8, uint8)registration, which caused aNOT_IMPLEMENTEDerror at runtime for such models.This PR aligns the kernel registration and tests with the ONNX spec.
Motivation and Context
Fixes #26743
Testing
A=int8, B=uint8combination:MatmulIntegerOpTest.MatMulInteger_int8_uint8_2DMatmulIntegerOpTest.MatMulInteger_int8_uint8_PerColumn_ND