Skip to content

perf(dsv4 prefill): avoid indexer pool token scan#618

Open
zhaozhaozz wants to merge 1 commit into
hw-native-sys:mainfrom
zhaozhaozz:perf/dsv4-indexer-pool-scatter
Open

perf(dsv4 prefill): avoid indexer pool token scan#618
zhaozhaozz wants to merge 1 commit into
hw-native-sys:mainfrom
zhaozhaozz:perf/dsv4-indexer-pool-scatter

Conversation

@zhaozhaozz

Copy link
Copy Markdown
Contributor

Summary

  • Scatter current indexer compressor projection results into overlap state before pooling.
  • Read ratio-4 pool windows directly from state and remove the repeated per-pool token overlay scan.
  • Document the required invariant that valid prefill tokens must have an inner state row.

Validation

  • python3 -m py_compile models/deepseek/v4/prefill_indexer_compressor.py
  • git diff --check origin/main...HEAD -- models/deepseek/v4/prefill_indexer_compressor.py
  • Remote NPU correctness: python models/deepseek/v4/prefill_indexer_compressor.py -p a2a3
  • Remote NPU PMU A/B on device 0 with --enable-pmu 2:
    • runtime: 3.82s -> 3.16s
    • prefill_idx_c4_softmax_pool PMU sum cycles: 2829455 -> 1540161 (-45.6%)
    • pool + state update/scatter PMU sum cycles: 6701356 -> 4444453 (-33.7%)
  • Remote NPU start position coverage: start_pos=0,1,2,3,4,5,63,64,127,128 all pass after rerunning one transient device-side 507018 on start_pos=0.

Notes

This follows the DeepSeek V4 decode-state ordering: current tokens are written into state before a compression boundary pools from the overlap window. If a future scheduler allows a valid current token to participate in pooling without a valid inner_state_slot_mapping, this path would need an overlay fallback.

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c03c33ba-6962-4519-9e45-4a3ed30b6532

📥 Commits

Reviewing files that changed from the base of the PR and between ba91c88 and 532b40d.

📒 Files selected for processing (1)
  • models/deepseek/v4/prefill_indexer_compressor.py

📝 Walkthrough

Walkthrough

The compressor now scatters KV and score projection outputs into flattened inner state before pooling, the pool stage reads those stored values instead of scanning tokens, and the later state-update loop is removed.

Changes

Inner state scatter and pool reuse

Layer / File(s) Summary
Pre-pool scatter
models/deepseek/v4/prefill_indexer_compressor.py
inner_state_slot_mapping and position_ids drive writes from kv_proj_scratch and score_proj_scratch + ape[...] into kv_state_flat and score_state_flat.
Pool reads state
models/deepseek/v4/prefill_indexer_compressor.py
The pool-stage token scan is removed, and pool window assembly reads kv_state_flat and score_state_flat instead of copying from projection scratch buffers.
Remove later writeback
models/deepseek/v4/prefill_indexer_compressor.py
The later state_update loop that wrote kv_state_flat and score_state_flat from kv_proj_scratch, ape, and pool_dep is deleted.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

A hop, a scatter, a whisk of the paw 🐇
KV and score land neatly by law.
I nibble the pool, no old scan in sight,
Burrowed state gleams tidy and bright.
Hoppity hooray for the change tonight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the main performance change: removing the token scan in the prefill indexer pool path.
Description check ✅ Passed The description is directly related to the changes and validation in the pull request.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the prefill_indexer_compressor for DeepSeek V4. It introduces a new pre-pass SPMD loop (prefill_idx_c4_state_scatter_pre) to handle state updates before the softmax pooling step, and removes the inline pooling loop over tokens as well as the post-pooling state update loop (prefill_idx_c4_state_update). I have no feedback to provide as there are no review comments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant