[QNN EP] Add prepare_and_load session option for single-session AOT inference by qti-mbadnara · Pull Request #517 · onnxruntime/onnxruntime-qnn

qti-mbadnara · 2026-06-11T03:00:02Z

Summary

Add a new session config flag ep.qnnexecutionprovider.enable_htp_prepare_and_load=1 that performs AOT compilation and context loading within a single ORT session. This allows large models (e.g., LLMs) to bypass the single process-domain (PD) memory limit imposed by the QNN JIT flow without requiring two separate sessions.

Motivation / Context

QNN EP's JIT flow places all graph splits in a single QNN process domain (PD), which exhausts memory for large models. The existing AOT workaround requires two sessions:

Session 1: prepare_only=1 — compile and emit _ctx.onnx
Session 2: load _ctx.onnx for inference (QNN spreads splits across multiple PDs)

This two-session flow is cumbersome for embedded customers who want a single API call. The new prepare_and_load option performs both steps internally:

Compile the model (JIT path)
Extract the QNN context binary
Release the compile-time single-PD context
Reload the binary via the AOT path (multi-PD)
Session is immediately ready for inference

Behavior matrix

`ep.context_enable`	`prepare_and_load`	Behavior
0	0	Existing JIT flow (unchanged)
1	0	Existing AOT flow (unchanged)
0	1	New Path A: compile → reload from memory → infer. No persisted artifact.
1	1	New Path B: compile → write `_ctx.onnx` + `.bin` → reload → infer. Artifact persists.

Test Plan: prepare_and_load

Unit Tests

Run new prepare_and_load tests

./onnxruntime_provider_test.exe --gtest_filter="*PrepareAndLoad*"

Expected: 5 tests pass (PathA, PathB, MutuallyExclusive, ContextDisabledWithFilePath, EmbedModeOverridden)

Manual Validation with onnxruntime_perf_test

Test 1: Path A — Prepare and load, no artifact

.\onnxruntime_perf_test.exe --plugin_ep_libs "QNNExecutionProvider|onnxruntime_providers_qnn.dll" --plugin_eps "QNNExecutionProvider" -i "backend_path|QnnHTP.dll" -m times -r 1 -p burst -C "ep.qnnexecutionprovider.enable_htp_prepare_and_load|1" <model.onnx>

Verify:

Inference completes successfully
No _ctx.onnx or _qnn.bin file created near the model

Test 2: Path B — Prepare and load + persist artifact

.\onnxruntime_perf_test.exe --plugin_ep_libs "QNNExecutionProvider|onnxruntime_providers_qnn.dll" --plugin_eps "QNNExecutionProvider" -i "backend_path|QnnHTP.dll" -m times -r 1 -p burst -C "ep.context_enable|1" -C "ep.context_file_path|model_ctx.onnx" -C "ep.context_embed_mode|0" -C "ep.qnnexecutionprovider.enable_htp_prepare_and_load|1" <model.onnx>

Verify:

Inference completes successfully
model_ctx.onnx exists on disk
model_ctx_qnn.bin exists on disk (external binary)

Test 3: Load saved artifact from Test 2 (existing AOT flow)

.\onnxruntime_perf_test.exe --plugin_ep_libs "QNNExecutionProvider|onnxruntime_providers_qnn.dll" --plugin_eps "QNNExecutionProvider" -i "backend_path|QnnHTP.dll" -m times -r 1 -p burst model_ctx.onnx

Verify:

Loads and runs without recompilation
Startup is faster than Test 2 (no compile step)

Test 4: Regression — JIT flow (default, no new flags)

.\onnxruntime_perf_test.exe --plugin_ep_libs "QNNExecutionProvider|onnxruntime_providers_qnn.dll" --plugin_eps "QNNExecutionProvider" -i "backend_path|QnnHTP.dll" -m times -r 1 -p burst <model.onnx>

Verify:

Works as before, no behavior change

Test 5: Regression — Prepare-only flow (existing AOT step 1)

.\onnxruntime_perf_test.exe --plugin_ep_libs "QNNExecutionProvider|onnxruntime_providers_qnn.dll" --plugin_eps "QNNExecutionProvider" -i "backend_path|QnnHTP.dll" -m times -r 1 -p burst -C "ep.context_enable|1" -C "ep.context_file_path|prep_only_ctx.onnx" -C "ep.qnnexecutionprovider.enable_htp_prepare_only|1" <model.onnx>

Verify:

prep_only_ctx.onnx is created on disk
Inference does NOT run (returns EP_FAIL or error about prepare_only mode)

Test 6: Error — Both prepare_and_load + prepare_only (mutually exclusive)

.\onnxruntime_perf_test.exe --plugin_ep_libs "QNNExecutionProvider|onnxruntime_providers_qnn.dll" --plugin_eps "QNNExecutionProvider" -i "backend_path|QnnHTP.dll" -m times -r 1 -p burst -C "ep.context_enable|1" -C "ep.qnnexecutionprovider.enable_htp_prepare_only|1" -C "ep.qnnexecutionprovider.enable_htp_prepare_and_load|1" <model.onnx>

Verify:

Session creation fails with error containing "mutually exclusive"

Test 7: Error — Contradictory options (no persist + file path)

.\onnxruntime_perf_test.exe --plugin_ep_libs "QNNExecutionProvider|onnxruntime_providers_qnn.dll" --plugin_eps "QNNExecutionProvider" -i "backend_path|QnnHTP.dll" -m times -r 1 -p burst -C "ep.context_file_path|some_path.onnx" -C "ep.qnnexecutionprovider.enable_htp_prepare_and_load|1" <model.onnx>

Verify:

Session creation fails with error containing "Contradictory"

Test 8: Embed mode override (warning + forced to external)

.\onnxruntime_perf_test.exe --plugin_ep_libs "QNNExecutionProvider|onnxruntime_providers_qnn.dll" --plugin_eps "QNNExecutionProvider" -i "backend_path|QnnHTP.dll" -m times -r 1 -p burst -C "ep.context_enable|1" -C "ep.context_file_path|embed_test_ctx.onnx" -C "ep.context_embed_mode|1" -C "ep.qnnexecutionprovider.enable_htp_prepare_and_load|1" <model.onnx>

Verify:

Warning in logs: "Overriding ep.context_embed_mode to 0"
Inference completes successfully
embed_test_ctx.onnx exists
embed_test_ctx_qnn.bin exists (external binary, NOT embedded)

Test 9: Prepare and load on already-compiled context model (warning, no-op)

.\onnxruntime_perf_test.exe --plugin_ep_libs "QNNExecutionProvider|onnxruntime_providers_qnn.dll" --plugin_eps "QNNExecutionProvider" -i "backend_path|QnnHTP.dll" -m times -r 1 -p burst -C "ep.qnnexecutionprovider.enable_htp_prepare_and_load|1" model_ctx.onnx

(Uses model_ctx.onnx from Test 2)

Verify:

Warning in logs: "prepare_and_load=1 is ignored because the input model is already a pre-compiled context model"
Loads and runs normally via existing AOT path

Summary Checklist

#	Test	Expected Result
1	Path A (no artifact)	Infer OK, no files written
2	Path B (persist)	Infer OK, `.onnx` + `.bin` created
3	Reload artifact	Infer OK, fast startup
4	JIT regression	Unchanged behavior
5	Prepare-only regression	Writes ctx, no inference
6	Mutually exclusive error	Fails: "mutually exclusive"
7	Contradictory error	Fails: "Contradictory"
8	Embed mode override	Warning + external `.bin` created
9	Flag on pre-compiled model	Warning + loads normally

qti-mbadnara · 2026-06-11T03:11:37Z

/ci

github-actions · 2026-06-11T03:49:56Z

🔄 CI triggered on dev/qti-mbadnara/enable_prepare_and_load (draft PR — CI was skipped on push) by @qti-mbadnara. Check the Actions tab for progress.

…le_prepare_and_load

onnxruntime deleted a comment from github-actions Bot Jun 18, 2026

qti-mbadnara closed this Jun 18, 2026

qti-mbadnara force-pushed the dev/qti-mbadnara/enable_prepare_and_load branch from a33e6a7 to 3ccf1fe Compare June 18, 2026 21:31

[QNN EP] Add Prepare and Load EP Provider Option

2281abd

qti-mbadnara changed the title ~~[QNN EP] Add support for htp_prepare_and_load EP Provider Option~~ [QNN EP] Add prepare_and_load session option for single-session AOT inference Jun 18, 2026

[QNN EP] Fix lint issues

4ece351

qti-mbadnara reopened this Jun 18, 2026

Merge remote-tracking branch 'origin/main' into dev/qti-mbadnara/enab…

b2baccb

…le_prepare_and_load

qti-mbadnara marked this pull request as ready for review June 18, 2026 23:30

qti-mbadnara requested review from qti-ashwshan, qti-chuteng, qti-jkilpatrick, qti-kromero, qti-yuduo, tirupath-qti and yath1 as code owners June 18, 2026 23:30

qti-mbadnara marked this pull request as draft June 19, 2026 01:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QNN EP] Add prepare_and_load session option for single-session AOT inference#517

[QNN EP] Add prepare_and_load session option for single-session AOT inference#517
qti-mbadnara wants to merge 3 commits into
mainfrom
dev/qti-mbadnara/enable_prepare_and_load

qti-mbadnara commented Jun 11, 2026 •

edited

Loading

Uh oh!

qti-mbadnara commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

qti-mbadnara commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation / Context

Behavior matrix

Test Plan: prepare_and_load

Unit Tests

Run new prepare_and_load tests

Manual Validation with onnxruntime_perf_test

Test 1: Path A — Prepare and load, no artifact

Test 2: Path B — Prepare and load + persist artifact

Test 3: Load saved artifact from Test 2 (existing AOT flow)

Test 4: Regression — JIT flow (default, no new flags)

Test 5: Regression — Prepare-only flow (existing AOT step 1)

Test 6: Error — Both prepare_and_load + prepare_only (mutually exclusive)

Test 7: Error — Contradictory options (no persist + file path)

Test 8: Embed mode override (warning + forced to external)

Test 9: Prepare and load on already-compiled context model (warning, no-op)

Summary Checklist

Uh oh!

qti-mbadnara commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qti-mbadnara commented Jun 11, 2026 •

edited

Loading