perf(dsv4 prefill): avoid indexer pool token scan by zhaozhaozz · Pull Request #618 · hw-native-sys/pypto-lib

zhaozhaozz · 2026-06-25T15:19:33Z

Summary

Scatter current indexer compressor projection results into overlap state before pooling.
Read ratio-4 pool windows directly from state and remove the repeated per-pool token overlay scan.
Document the required invariant that valid prefill tokens must have an inner state row.

Validation

python3 -m py_compile models/deepseek/v4/prefill_indexer_compressor.py
git diff --check origin/main...HEAD -- models/deepseek/v4/prefill_indexer_compressor.py
Remote NPU correctness: python models/deepseek/v4/prefill_indexer_compressor.py -p a2a3
Remote NPU PMU A/B on device 0 with --enable-pmu 2:
- runtime: 3.82s -> 3.16s
- prefill_idx_c4_softmax_pool PMU sum cycles: 2829455 -> 1540161 (-45.6%)
- pool + state update/scatter PMU sum cycles: 6701356 -> 4444453 (-33.7%)
Remote NPU start position coverage: start_pos=0,1,2,3,4,5,63,64,127,128 all pass after rerunning one transient device-side 507018 on start_pos=0.

Notes

This follows the DeepSeek V4 decode-state ordering: current tokens are written into state before a compression boundary pools from the overlap window. If a future scheduler allows a valid current token to participate in pooling without a valid inner_state_slot_mapping, this path would need an overlay fallback.

coderabbitai · 2026-06-25T15:19:49Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c03c33ba-6962-4519-9e45-4a3ed30b6532

📥 Commits

Reviewing files that changed from the base of the PR and between ba91c88 and 532b40d.

📒 Files selected for processing (1)

models/deepseek/v4/prefill_indexer_compressor.py

📝 Walkthrough

Walkthrough

The compressor now scatters KV and score projection outputs into flattened inner state before pooling, the pool stage reads those stored values instead of scanning tokens, and the later state-update loop is removed.

Changes

Inner state scatter and pool reuse

Layer / File(s)	Summary
Pre-pool scatter `models/deepseek/v4/prefill_indexer_compressor.py`	`inner_state_slot_mapping` and `position_ids` drive writes from `kv_proj_scratch` and `score_proj_scratch + ape[...]` into `kv_state_flat` and `score_state_flat`.
Pool reads state `models/deepseek/v4/prefill_indexer_compressor.py`	The pool-stage token scan is removed, and pool window assembly reads `kv_state_flat` and `score_state_flat` instead of copying from projection scratch buffers.
Remove later writeback `models/deepseek/v4/prefill_indexer_compressor.py`	The later `state_update` loop that wrote `kv_state_flat` and `score_state_flat` from `kv_proj_scratch`, `ape`, and `pool_dep` is deleted.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

hw-native-sys/pypto-lib#523: Rewrites the same compressor state-write path around slot mapping and flattened inner KV/score state.
hw-native-sys/pypto-lib#540: Also changes how kv_state_flat and score_state_flat are updated from projection and APE inputs.
hw-native-sys/pypto-lib#580: Modifies the same prefill_indexer_compressor.py state writeback flow and related scan/update ordering.

Poem

A hop, a scatter, a whisk of the paw 🐇
KV and score land neatly by law.
I nibble the pool, no old scan in sight,
Burrowed state gleams tidy and bright.
Hoppity hooray for the change tonight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly matches the main performance change: removing the token scan in the prefill indexer pool path.
Description check	✅ Passed	The description is directly related to the changes and validation in the pull request.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

gemini-code-assist

Code Review

This pull request refactors the prefill_indexer_compressor for DeepSeek V4. It introduces a new pre-pass SPMD loop (prefill_idx_c4_state_scatter_pre) to handle state updates before the softmax pooling step, and removes the inline pooling loop over tokens as well as the post-pooling state update loop (prefill_idx_c4_state_update). I have no feedback to provide as there are no review comments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

perf(dsv4 prefill): avoid indexer pool token scan

532b40d

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(dsv4 prefill): avoid indexer pool token scan#618

perf(dsv4 prefill): avoid indexer pool token scan#618
zhaozhaozz wants to merge 1 commit into
hw-native-sys:mainfrom
zhaozhaozz:perf/dsv4-indexer-pool-scatter

zhaozhaozz commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

zhaozhaozz commented Jun 25, 2026

Summary

Validation

Notes

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading