Skip to content

[CPU] Enable BF16 dynamic quantization path for compressed FullyConnected#35726

Merged
maxnick merged 5 commits into
openvinotoolkit:masterfrom
liubo-intel:liubo/dynamic_quant_bf16_support
May 27, 2026
Merged

[CPU] Enable BF16 dynamic quantization path for compressed FullyConnected#35726
maxnick merged 5 commits into
openvinotoolkit:masterfrom
liubo-intel:liubo/dynamic_quant_bf16_support

Conversation

@liubo-intel
Copy link
Copy Markdown
Contributor

@liubo-intel liubo-intel commented May 8, 2026

Details:

  • Extends the CPU plugin's weight-decompression FC path so that BF16 activations can go through the oneDNN dynamic-quantization kernel, in addition to F32.

oneDNN fork PR:openvinotoolkit/oneDNN#310

Tickets:

@liubo-intel liubo-intel marked this pull request as ready for review May 8, 2026 05:39
@liubo-intel liubo-intel requested review from a team as code owners May 8, 2026 05:39
@github-actions github-actions Bot added the category: CPU OpenVINO CPU plugin label May 8, 2026
@yuxu42 yuxu42 requested a review from Copilot May 8, 2026 05:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the Intel CPU plugin’s compressed FullyConnected (weights decompression) path to allow BF16 activations to use the oneDNN dynamic quantization implementation (previously limited to F32), and adds functional coverage for the new BF16 scenario.

Changes:

  • Enable BF16 as a supported activation type for compressed FullyConnected on x86_64.
  • Extend dynamic-quantization eligibility checks in the oneDNN FC primitive to accept BF16 sources (with ISA gating).
  • Add a BF16-specific MatMul-weights-decompression test fixture and instantiate new dyn-quant BF16 test cases.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/plugins/intel_cpu/tests/functional/custom/subgraph_tests/src/x64/matmul_weights_decompression.cpp Adds BF16 dyn-quant test instantiation and a BF16-specific additional-config filter.
src/plugins/intel_cpu/tests/functional/custom/subgraph_tests/src/classes/matmul_weights_decompression.hpp Introduces a BF16-derived test class and a shared setup helper taking data precision.
src/plugins/intel_cpu/tests/functional/custom/subgraph_tests/src/classes/matmul_weights_decompression.cpp Refactors setup to parameterize network precision and adds BF16 test execution.
src/plugins/intel_cpu/src/nodes/fullyconnected.cpp Enables BF16 in getSupportedCompressedActivationsTypes() for x86_64.
src/plugins/intel_cpu/src/nodes/executors/dnnl/dnnl_fullyconnected_primitive.cpp Updates dynamic-quantization gating to allow BF16 sources with additional ISA checks.

Comment thread src/plugins/intel_cpu/src/nodes/fullyconnected.cpp
Comment thread src/plugins/intel_cpu/src/nodes/executors/dnnl/dnnl_fullyconnected_primitive.cpp Outdated
Copy link
Copy Markdown
Collaborator

@rkazants rkazants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liubo-intel, @yuxu42, is it needed for Qwen3,5?

@liubo-intel
Copy link
Copy Markdown
Contributor Author

@liubo-intel, @yuxu42, is it needed for Qwen3,5?

Hi, @rkazants : This PR is mainly intended to enable NVL to benefit from the more efficient avx512_core_vnni instruction set for the BF16 dynamic-quant path. As far as I know, it is not in the scope of the Qwen3.5 enablement effort.

@yuxu42 yuxu42 requested a review from maxnick May 11, 2026 06:34
@maxnick maxnick added this to the 2026.3 milestone May 12, 2026
@maxnick
Copy link
Copy Markdown
Contributor

maxnick commented May 18, 2026

@liubo-intel , could you please create a dedicated oneDNN fork PR to facilitate the review process?

Comment thread src/plugins/intel_cpu/src/nodes/fullyconnected.cpp
* Gate BF16 dyn-quant entry on AMX-capable HW (two layers: node-level
  getSupportedCompressedActivationsTypes + primitive-level
  useDynamicQuantizationImpl), since AMX BF16 TMUL handles long
  prompts (prefill) more efficiently than VNNI int8 dyn-quant.

* Drive the BF16 dyn-quant test through the inference_precision hint
  on an f32 IR; remove the MatmulWeightsDecompressionBF16 subclass and
  decompression_precisions_bf16.
@liubo-intel liubo-intel force-pushed the liubo/dynamic_quant_bf16_support branch from 54f8ed1 to ebdc717 Compare May 19, 2026 08:00
@liubo-intel
Copy link
Copy Markdown
Contributor Author

@liubo-intel , could you please create a dedicated oneDNN fork PR to facilitate the review process?

oneDNN fork PR:openvinotoolkit/oneDNN#310

Copy link
Copy Markdown
Contributor

@maxnick maxnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM. Please apply comment in the corresponding oneDNN PR.

@yuxu42 yuxu42 requested a review from maxnick May 26, 2026 05:28
@maxnick maxnick enabled auto-merge May 27, 2026 17:02
@maxnick maxnick added this pull request to the merge queue May 27, 2026
Merged via the queue into openvinotoolkit:master with commit bf0228a May 27, 2026
195 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: CPU OpenVINO CPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants