Skip to content

Conversation

@rivkastroh
Copy link
Contributor

@rivkastroh rivkastroh commented Dec 7, 2025

Description

This PR adds support for the MatMulInteger operator when
input A is int8 and input B is uint8, and adds unit tests
to cover this type combination.

According to the ONNX specification for MatMulInteger, the type
constraints are:

  • T1 ∈ {int8, uint8}
  • T2 ∈ {int8, uint8}
  • T3 = int32

This means all four combinations (T1, T2) = (int8,int8), (int8,uint8), (uint8,int8), (uint8,uint8) are valid. However, the implementation
was missing the (int8, uint8) registration, which caused a
NOT_IMPLEMENTED error at runtime for such models.

This PR aligns the kernel registration and tests with the ONNX spec.

Motivation and Context

Fixes #26743

Testing

  • Added unit tests for the A=int8, B=uint8 combination:
    • MatmulIntegerOpTest.MatMulInteger_int8_uint8_2D
    • MatmulIntegerOpTest.MatMulInteger_int8_uint8_PerColumn_ND
  • All tests pass locally.

@rivkastroh
Copy link
Contributor Author

@yuslepukhin, @yufenglee : could you please review the PR? Thanks.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the MatMulInteger operator when input A is int8 and input B is uint8, completing the ONNX specification compliance. The ONNX spec allows all four combinations of (int8, uint8) types for the two inputs, but the (int8, uint8) combination was previously missing, causing runtime errors.

  • Updated the kernel registration to allow uint8 or int8 for T2 when T1 is int8
  • Added comprehensive unit tests covering both 2D and N-D cases with per-column zero points

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
onnxruntime/core/providers/cpu/quantization/matmul_integer.cc Modified the int8_t typed kernel registration to accept both uint8_t and int8_t for the T2 type constraint, enabling the missing int8 x uint8 input combination
onnxruntime/test/providers/cpu/math/matmul_integer_test.cc Added two new test cases (MatMulInteger_int8_uint8_2D and MatMulInteger_int8_uint8_PerColumn_ND) to verify the int8 x uint8 combination works correctly in both simple 2D and complex N-D scenarios with per-column zero points

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yuslepukhin
Copy link
Member

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows OpenVINO CI Pipeline, Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@yuslepukhin
Copy link
Member

Well, there is a test failure:

[ RUN ] MatmulIntegerOpTest.MatMulInteger_int8_uint8_PerColumn_ND
2: 2025-12-08 22:02:43.0817483 [E:onnxruntime:MatMulInteger, inference_session.cc:2542 onnxruntime::InferenceSession::Initialize::<lambda_fe9fb62842ddccf7ffc0bf349066f2d4>::operator ()] Exception during initialization: D:\a_work\onnxruntime\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\AbiCustomRegistry.cpp(519)\onnxruntime_provider_test.exe!00007FF6F5AF1D09: (caller: 00007FF6F5A65620) Exception(38) tid(2510) 80070057 The parameter is incorrect.
2:
2: D:\a_work\onnxruntime\onnxruntime\onnxruntime\test\unittest_util\base_tester.cc(319): error: Expected equality of these values:
2: expect_result
2: Which is: 4-byte object <00-00 00-00>
2: ExpectResult::kExpectFailure
2: Which is: 4-byte object <01-00 00-00>
2: Initialize failed but expected success: Exception during initialization: D:\a_work\onnxruntime\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\AbiCustomRegistry.cpp(519)\onnxruntime_provider_test.exe!00007FF6F5AF1D09: (caller: 00007FF6F5A65620) Exception(38) tid(2510) 80070057 The parameter is incorrect.
2:
2: Google Test trace:
2: D:\a_work\onnxruntime\onnxruntime\onnxruntime\test\unittest_util\base_tester.cc(877): registered execution providers: DmlExecutionProvider
2:
2: [ FAILED ] MatmulIntegerOpTest.MatMulInteger_int8_uint8_PerColumn_ND (237 ms)

@rivkastroh
Copy link
Contributor Author

@yuslepukhin, I updated the new int8 x uint8 per-column test to be skipped when the DmlExecutionProvider is present, since it hits the same existing DirectML AbiCustomRegistry issue as the existing MatMulInteger_PerColumn_ND test (see the TODO comment there).

The test still runs on other execution providers, so it continues to validate the new int8 x uint8 path without being blocked by the known DML issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MatMulInteger NOT_IMPLEMENTED for INT8 x UINT8 (A=int8, B=uint8)

2 participants