Skip to content

[CPU] GDN uses native bf16/f16 instruction.#36160

Open
zhangYiIntel wants to merge 8 commits into
openvinotoolkit:masterfrom
zhangYiIntel:yi3/enable_jit_gdn
Open

[CPU] GDN uses native bf16/f16 instruction.#36160
zhangYiIntel wants to merge 8 commits into
openvinotoolkit:masterfrom
zhangYiIntel:yi3/enable_jit_gdn

Conversation

@zhangYiIntel
Copy link
Copy Markdown
Contributor

Details:

  • Enable jit kernel with native f16/bf16 instruction only support head_size which is multiple of 32
  • Fuse reorder within intrinsic kernel for others

Tickets:

AI Assistance:

  • AI assistance used: yes
  • Use AI to simplify code

@github-actions github-actions Bot added the category: CPU OpenVINO CPU plugin label Jun 1, 2026
@zhangYiIntel zhangYiIntel marked this pull request as ready for review June 3, 2026 05:21
@zhangYiIntel zhangYiIntel requested review from a team as code owners June 3, 2026 05:21
@zhangYiIntel zhangYiIntel added this to the 2026.3 milestone Jun 3, 2026
@maxnick
Copy link
Copy Markdown
Contributor

maxnick commented Jun 3, 2026

@xuchen-intel , could you please review?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the CPU plugin’s GatedDeltaNet (GDN) execution to better support f16/bf16 by introducing an AVX-512 JIT kernel leveraging native bf16/fp16 instructions (for head sizes divisible by 32) and by generalizing the reference recurrent linear attention path to handle xf16 precisions. It also adds functional test coverage for f16/bf16 GDN configurations.

Changes:

  • Add an x64 AVX-512 JIT kernel for GDN using native bf16/fp16 instructions with a head-size % 32 constraint.
  • Templatize the recurrent linear attention kernel to support f16/bf16 in addition to f32.
  • Add functional test instances for GDN f16/bf16 cases.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/plugins/intel_cpu/tests/functional/shared_tests_instances/subgraph_tests/gated_delta_net.cpp Adds f16/bf16 parameterized test cases for GDN.
src/plugins/intel_cpu/src/nodes/kernels/x64/gdn_jit_kernel.hpp Introduces the JIT kernel interface, call args, and compile params for xf16 GDN.
src/plugins/intel_cpu/src/nodes/kernels/x64/gdn_jit_kernel.cpp Implements the AVX-512 bf16/fp16 JIT codegen and factory creation logic.
src/plugins/intel_cpu/src/nodes/kernels/linear_attn/recurrent_linear_attn.cpp Generalizes the reference kernel to xf16 and optimizes some loads/stores.
src/plugins/intel_cpu/src/nodes/gated_delta_net.h Adds x64-only members for caching/using the GDN JIT kernel.
src/plugins/intel_cpu/src/nodes/gated_delta_net.cpp Enables f16/bf16 execution, adds JIT dispatch, and adjusts scratchpad sizing/typing.

Comment thread src/plugins/intel_cpu/src/nodes/kernels/linear_attn/recurrent_linear_attn.cpp Outdated
Comment thread src/plugins/intel_cpu/src/nodes/kernels/linear_attn/recurrent_linear_attn.cpp Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: CPU OpenVINO CPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants