Add independent K scale support in FP8 prefill attention by xueyangcs · Pull Request #40 · Tencent/hpc-ops

xueyangcs · 2026-04-03T08:19:53Z

Summary

Split the combined qkscale parameter into separate qscale and kscale in the FP8 prefill attention kernel, allowing Q and K to use different quantization granularities.

Breaking Change:

qkscale: [num_batch, num_head_q, max_seqlens_q_pad], float32 is replaced by:
qscale: [num_batch, num_head_q, max_seqlens_q_pad], float32 and
kscale: [1], float32

add k scale in prefill attention

da755bd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add independent K scale support in FP8 prefill attention#40

Add independent K scale support in FP8 prefill attention#40
xueyangcs wants to merge 1 commit intoTencent:mainfrom
xueyangcs:feature/add_k_scale_for_prefill_attn

xueyangcs commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xueyangcs commented Apr 3, 2026

Summary

Breaking Change:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant