selective_mixed_precision: QKV-aware overrides, AUTO memory mode, MULTI_GPU dispatch#2475
Merged
Conversation
…TI_GPU dispatch - Normalize per-layer quant config overrides so Q/K/V projections in the same attention block share precision, required by ModelBuilder for GQA fusion. - Add AUTO setting for kld_memory_mode that picks among FULL, MULTI_GPU, LOW_MEMORY, OFFLOAD based on available GPU memory and model size. - Add MULTI_GPU mode that uses Accelerate's dispatch_model with _no_split_modules honored, plus a coalescing pass that pins every model.layers.N.* entry to a single device and falls back to LOW_MEMORY if a decoder layer still spans devices. - Tests: 24 unit tests covering QKV grouping, AUTO selection thresholds, and the MULTI_GPU device-map coalescing path.
5 tasks
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends SelectiveMixedPrecision with QKV-aware scoring/overrides and more flexible KLD memory execution modes for large HF models.
Changes:
- Refactors scored mixed-precision selection into strategy classes with QKV group aggregation.
- Adds
kld_memory_modewithauto,multi_gpu,low_memory, andoffloadbehavior. - Adds QKV quantization-config normalization and tests for SMP and quant-utils behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
olive/passes/pytorch/selective_mixed_precision.py |
Adds scoring strategies, QKV grouping, KLD memory-mode resolution, and multi-GPU dispatch. |
olive/passes/pytorch/quant_utils.py |
Adds QKV grouping/normalization helpers and rewrites quant config merging in prepare_model. |
test/passes/pytorch/test_selective_mixed_precision.py |
Adds extensive tests for scoring strategies, QKV grouping, and KLD memory modes. |
test/passes/pytorch/test_quant_utils.py |
Adds tests for QKV normalization and prepare_model quant-config handling. |
- Refactor scoring algorithms into per-algorithm strategy classes that own module stats collection, group aggregation, and scoring. - Make QKV-aware grouping the default behavior (no opt-in config). - Rewrite prepare_model / normalize_qkv_quant_config in quant_utils with defensive deepcopy of existing quantization configs and locked-override semantics for already-quantized members. - Drop overrides for modules that won't be quantized; preserve pre-existing on-disk overrides verbatim. - Split tests: move quant_utils tests into test_quant_utils.py; keep SMP behavior tests in test_selective_mixed_precision.py. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
hanbitmyths
approved these changes
May 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your changes
Based on #2473
Checklist before requesting a review
lintrunner -a(Optional) Issue link