selective_mixed_precision: QKV-aware overrides, AUTO memory mode, MULTI_GPU dispatch by jambayk · Pull Request #2475 · microsoft/Olive

jambayk · 2026-05-27T20:05:42Z

Describe your changes

Based on #2473

Checklist before requesting a review

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Lint and apply fixes to your code by running lintrunner -a
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

…TI_GPU dispatch - Normalize per-layer quant config overrides so Q/K/V projections in the same attention block share precision, required by ModelBuilder for GQA fusion. - Add AUTO setting for kld_memory_mode that picks among FULL, MULTI_GPU, LOW_MEMORY, OFFLOAD based on available GPU memory and model size. - Add MULTI_GPU mode that uses Accelerate's dispatch_model with _no_split_modules honored, plus a coalescing pass that pins every model.layers.N.* entry to a single device and falls back to LOW_MEMORY if a decoder layer still spans devices. - Tests: 24 unit tests covering QKV grouping, AUTO selection thresholds, and the MULTI_GPU device-map coalescing path.

Copilot

Pull request overview

This PR extends SelectiveMixedPrecision with QKV-aware scoring/overrides and more flexible KLD memory execution modes for large HF models.

Changes:

Refactors scored mixed-precision selection into strategy classes with QKV group aggregation.
Adds kld_memory_mode with auto, multi_gpu, low_memory, and offload behavior.
Adds QKV quantization-config normalization and tests for SMP and quant-utils behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
`olive/passes/pytorch/selective_mixed_precision.py`	Adds scoring strategies, QKV grouping, KLD memory-mode resolution, and multi-GPU dispatch.
`olive/passes/pytorch/quant_utils.py`	Adds QKV grouping/normalization helpers and rewrites quant config merging in `prepare_model`.
`test/passes/pytorch/test_selective_mixed_precision.py`	Adds extensive tests for scoring strategies, QKV grouping, and KLD memory modes.
`test/passes/pytorch/test_quant_utils.py`	Adds tests for QKV normalization and `prepare_model` quant-config handling.

- Refactor scoring algorithms into per-algorithm strategy classes that own module stats collection, group aggregation, and scoring. - Make QKV-aware grouping the default behavior (no opt-in config). - Rewrite prepare_model / normalize_qkv_quant_config in quant_utils with defensive deepcopy of existing quantization configs and locked-override semantics for already-quantized members. - Drop overrides for modules that won't be quantized; preserve pre-existing on-disk overrides verbatim. - Split tests: move quant_utils tests into test_quant_utils.py; keep SMP behavior tests in test_selective_mixed_precision.py. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

hanbitmyths and others added 4 commits May 22, 2026 16:44

docs: surface KLD memory modes and QKV grouping in pass docstring

cc52a7e

Merge branch 'main' into smp-qkv-aware-multi-gpu

a4a2b2a

Address SMP review feedback

8e98a92

jambayk mentioned this pull request May 27, 2026

selective_mixed_precision: QKV-aware overrides, AUTO memory mode, MULTI_GPU dispatch #2473

Closed

5 tasks

github-advanced-security AI found potential problems May 27, 2026

View reviewed changes

jambayk force-pushed the jambayk/mp-qkv branch from 6cbabc8 to 8f722f3 Compare May 27, 2026 20:17

jambayk assigned hanbitmyths May 27, 2026

jambayk marked this pull request as ready for review May 27, 2026 21:32

Copilot AI review requested due to automatic review settings May 27, 2026 21:32

Copilot started reviewing on behalf of jambayk May 27, 2026 21:32 View session

Copilot AI reviewed May 27, 2026

View reviewed changes

Comment thread olive/passes/pytorch/quant_utils.py

Comment thread olive/passes/pytorch/selective_mixed_precision.py

Comment thread olive/passes/pytorch/selective_mixed_precision.py

jambayk force-pushed the jambayk/mp-qkv branch from 8f722f3 to 085e522 Compare May 27, 2026 21:59

Merge branch 'main' into jambayk/mp-qkv

065aae2

hanbitmyths approved these changes May 27, 2026

View reviewed changes

hanbitmyths enabled auto-merge (squash) May 27, 2026 23:33

hanbitmyths merged commit 04ef7d2 into main May 27, 2026
11 checks passed

hanbitmyths deleted the jambayk/mp-qkv branch May 27, 2026 23:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

selective_mixed_precision: QKV-aware overrides, AUTO memory mode, MULTI_GPU dispatch#2475

selective_mixed_precision: QKV-aware overrides, AUTO memory mode, MULTI_GPU dispatch#2475
hanbitmyths merged 6 commits into
mainfrom
jambayk/mp-qkv

jambayk commented May 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jambayk commented May 27, 2026

Describe your changes

Checklist before requesting a review

(Optional) Issue link

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants