ONNX Runtime QNN EP Quantization Support #422
Unanswered
brady-cherish
asked this question in
Q&A
Replies: 1 comment
-
|
Hello @brady-cherish, thank you for starting the thread. Apologies, the documentation is probably not very clear about this. We will update it. EP can ingest a valid ONNX file with QDQs inserted in it. Do you already have a quantized model which you would like to try it through QNN EP or are you looking to quantize a model through our tool chain? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I wanted to clarify the purpose of this repository after encountering it in the ONNX Runtime documentation for Quantization for the QNN EP here: https://onnxruntime.ai/docs/execution-providers/QNN-ExecutionProvider.html#generating-a-quantized-model-x64-only
It mentions the following:
"""
Install the ONNX Runtime x64 python package. (please note, you must use x64 package for quantizing the model. use the arm64 package for inferencing and utilizing the HTP/NPU)
python -m pip install onnxruntime-qnn"""
Given the most recent onnxruntime-qnn releases do not include builds to support x86_64 ABIs, what versions should be used for quantization support for the QNN Execution Provider?
The unavailability of x86_64 packages seems to contradict the ONNX Runtime documentation for quantization:
"""
The quantization utilities are currently only supported on x86_64 due to issues installing the onnx package on ARM64
"""
Would it be possible to make the documentation for both ONNX QNN EP Quantization and this repository more clear about OS and version support for quantization?
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions