You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[QNN EP] documentation updates for the GPU backend. (#26508)
### Description
Updating QNN EP documentation to include the GPU backend.
### Motivation and Context
GPU backend differs in specific areas from the HTP backend.
Copy file name to clipboardExpand all lines: docs/execution-providers/QNN-ExecutionProvider.md
+17Lines changed: 17 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -64,6 +64,7 @@ The QNN Execution Provider supports a number of configuration options. These pro
64
64
|---|-----|
65
65
|'libQnnCpu.so' or 'QnnCpu.dll'|Enable CPU backend. See `backend_type` 'cpu'.|
66
66
|'libQnnHtp.so' or 'QnnHtp.dll'|Enable HTP backend. See `backend_type` 'htp'.|
67
+
|'libQnnGpu.so' or 'QnnGpu.dll'|Enable GPU backend. See `backend_type` 'gpu'.|
67
68
68
69
**Note:**`backend_path` is an alternative to `backend_type`. At most one of the two should be specified.
69
70
`backend_path` requires a platform-specific path (e.g., `libQnnCpu.so` vs. `QnnCpu.dll`) but also allows one to specify an arbitrary path.
@@ -392,6 +393,22 @@ Available session configurations include:
392
393
393
394
The above snippet only specifies the `backend_path` provider option. Refer to the [Configuration options section](./QNN-ExecutionProvider.md#configuration-options) for a list of all available QNN EP provider options.
394
395
396
+
## Running a model with QNN EP's GPU backend
397
+
398
+
The QNN GPU backend can run models with 32-bit/16-bit floating-point activations and weights as such without prior quantization. A 16-bit floating-point model generally can run inference faster on the GPU compared to its 32-bit version. To help reduce the size of large models, quantizing weights to `uint8`, while keeping activations in float is also supported.
399
+
400
+
Other than the quantized model requirement mentioned in the above HTP backend section, all other requirements are valid for the GPU backend also. So is the model inference sample code except for the portion where you specify the backend.
provider_options=[{"backend_path": "QnnGpu.dll"}]) # Provide path to Gpu dll in QNN SDK
409
+
410
+
```
411
+
395
412
## QNN context binary cache feature
396
413
There's a QNN context which contains QNN graphs after converting, compiling, finalizing the model. QNN can serialize the context into binary file, so that user can use it for futher inference directly (without the QDQ model) to improve the model loading cost.
397
414
The QNN Execution Provider supports a number of session options to configure this.
0 commit comments