Skip to content

selective_mixed_precision: QKV-aware overrides, AUTO memory mode, MULTI_GPU dispatch#2475

Merged
hanbitmyths merged 6 commits into
mainfrom
jambayk/mp-qkv
May 27, 2026
Merged

selective_mixed_precision: QKV-aware overrides, AUTO memory mode, MULTI_GPU dispatch#2475
hanbitmyths merged 6 commits into
mainfrom
jambayk/mp-qkv

Conversation

@jambayk
Copy link
Copy Markdown
Contributor

@jambayk jambayk commented May 27, 2026

Describe your changes

Based on #2473

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by running lintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

hanbitmyths and others added 4 commits May 22, 2026 16:44
…TI_GPU dispatch

- Normalize per-layer quant config overrides so Q/K/V projections in the same
  attention block share precision, required by ModelBuilder for GQA fusion.
- Add AUTO setting for kld_memory_mode that picks among FULL, MULTI_GPU,
  LOW_MEMORY, OFFLOAD based on available GPU memory and model size.
- Add MULTI_GPU mode that uses Accelerate's dispatch_model with
  _no_split_modules honored, plus a coalescing pass that pins every
  model.layers.N.* entry to a single device and falls back to LOW_MEMORY if a
  decoder layer still spans devices.
- Tests: 24 unit tests covering QKV grouping, AUTO selection thresholds, and
  the MULTI_GPU device-map coalescing path.
Comment thread test/passes/pytorch/test_selective_mixed_precision.py Fixed
Comment thread test/passes/pytorch/test_selective_mixed_precision.py Fixed
Comment thread test/passes/pytorch/test_selective_mixed_precision.py Fixed
Comment thread test/passes/pytorch/test_selective_mixed_precision.py Fixed
Comment thread test/passes/pytorch/test_selective_mixed_precision.py Fixed
Comment thread test/passes/pytorch/test_selective_mixed_precision.py Fixed
Comment thread test/passes/pytorch/test_selective_mixed_precision.py Fixed
@jambayk jambayk marked this pull request as ready for review May 27, 2026 21:32
Copilot AI review requested due to automatic review settings May 27, 2026 21:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends SelectiveMixedPrecision with QKV-aware scoring/overrides and more flexible KLD memory execution modes for large HF models.

Changes:

  • Refactors scored mixed-precision selection into strategy classes with QKV group aggregation.
  • Adds kld_memory_mode with auto, multi_gpu, low_memory, and offload behavior.
  • Adds QKV quantization-config normalization and tests for SMP and quant-utils behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
olive/passes/pytorch/selective_mixed_precision.py Adds scoring strategies, QKV grouping, KLD memory-mode resolution, and multi-GPU dispatch.
olive/passes/pytorch/quant_utils.py Adds QKV grouping/normalization helpers and rewrites quant config merging in prepare_model.
test/passes/pytorch/test_selective_mixed_precision.py Adds extensive tests for scoring strategies, QKV grouping, and KLD memory modes.
test/passes/pytorch/test_quant_utils.py Adds tests for QKV normalization and prepare_model quant-config handling.

Comment thread olive/passes/pytorch/quant_utils.py
Comment thread olive/passes/pytorch/selective_mixed_precision.py
Comment thread olive/passes/pytorch/selective_mixed_precision.py
- Refactor scoring algorithms into per-algorithm strategy classes that own
  module stats collection, group aggregation, and scoring.
- Make QKV-aware grouping the default behavior (no opt-in config).
- Rewrite prepare_model / normalize_qkv_quant_config in quant_utils with
  defensive deepcopy of existing quantization configs and locked-override
  semantics for already-quantized members.
- Drop overrides for modules that won't be quantized; preserve pre-existing
  on-disk overrides verbatim.
- Split tests: move quant_utils tests into test_quant_utils.py; keep SMP
  behavior tests in test_selective_mixed_precision.py.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@hanbitmyths hanbitmyths enabled auto-merge (squash) May 27, 2026 23:33
@hanbitmyths hanbitmyths merged commit 04ef7d2 into main May 27, 2026
11 checks passed
@hanbitmyths hanbitmyths deleted the jambayk/mp-qkv branch May 27, 2026 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants