-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Describe the feature request
Currently, the QNN Execution Provider, only exposes the cpu, gpu, htp, and saver backends through the backend_type config option. While other backends are supported through the backend_path option (e.g: Qnn DSP backend), I discovered during testing that the Qnn HTA backend does not seem to work, even when enabling the CPU EP fallback (testing done using onnxruntime 1.24.0, and qnn_sdk 2.38.0.250901 / 2.41.0.251128).
Digging a bit closer, the issue seemed to be caused by bad operation validation in the qnn BaseOpBuilder. By default, onnxruntime automatically inserts transpose nodes to compensate for the difference in preferred data layout (NCHW vs NHWC). The issue lies in that, currently, transpose nodes are unsupported for the HTA backend. Normally, this wouldn't be an issue if the CPU EP fallback is enabled, as the unsupported nodes would just fallback to the CPU EP. Sadly, the IsOpSupported validation incorrectly marks the unsupported operations as supported, so that the fallback does not correctly take place
Simply fixing the IsOpSupported check to correctly mark unsupported operations as unsupported resolves the issue, allowing users to use the HTA backend
The feature request here is, therefore, to:
- Add better operation validation for the QNN EP (e.g: the QNN sdk offers a
backendGetSupportedOperations, which can be used to verify if the current operation is supported on the loaded backend) - Similarly to the
htpbackend, expose thedspandhtabackends through thebackend_typeconfig option
Describe scenario use case
Anywhere where the user would like to use the QNN HTA backend (especially relevant for older chipsets where the hardware required for the HTP backend is not available)