Skip to content

test(simt): unify scatter kernel on templated MSCATTER, drop __CPU_SIM fork#1160

Open
ChaoZheng109 wants to merge 1 commit into
hw-native-sys:mainfrom
ChaoZheng109:fix/simt-scatter-unify-mscatter
Open

test(simt): unify scatter kernel on templated MSCATTER, drop __CPU_SIM fork#1160
ChaoZheng109 wants to merge 1 commit into
hw-native-sys:mainfrom
ChaoZheng109:fix/simt-scatter-unify-mscatter

Conversation

@ChaoZheng109

Copy link
Copy Markdown
Collaborator

What

Remove the #ifdef __CPU_SIM fork in the SIMT element-scatter ST kernel
(kernel_simt_scatter.cpp)
so both the CPU simulator and a5 onboard issue the same instruction:

MSCATTER<Coalesce::Elem, ScatterAtomicOp::None, ScatterOOB::Skip>(outGlobal, srcTile, idxTile);

Why

The fork existed because pto-isa previously gated the templated MSCATTER
overloads behind PTO_NPU_ARCH_A5 only — they weren't visible to the CPU
simulator, so the sim path had to fall back to the non-templated form
(pto-isa#164). The pinned pto-isa (bumped to 016396b5 in #1156, the
pto-isa#166 mechanism) now opens the templated overloads to __CPU_SIM as
well as PTO_NPU_ARCH_A5, so the explicit templated call compiles and runs
identically on both backends.

Caveat — non-templated form is still not portable

Only the templated overload was unified. The non-templated
MSCATTER(dst, src, idx) still picks each backend's own default:
Coalesce::Elem on CPU sim (pto/cpu/MScatter.hpp) vs Coalesce::Row on
a5 onboard (pto/npu/a5/MScatter.hpp). So the explicit
MSCATTER<Coalesce::Elem, ...> is the only instruction that behaves the same
on both — hence keeping the template arguments rather than relying on the
non-templated default.

Testing

  • --platform a5sim: 1 passed.
  • Onboard a5 call site is unchanged by this cleanup (it already used this
    exact instruction), so a5 behavior is unaffected. An a5 onboard rerun is
    still warranted to close the loop — not available on the current a2a3 dev
    box.

Closes #1159

… MSCATTER

The simt_basic scatter kernel forked the scatter call on __CPU_SIM
because pto-isa previously gated the templated MSCATTER overloads behind
PTO_NPU_ARCH_A5 only, so the CPU simulator had to use the non-templated
form. The pinned pto-isa (bumped to 016396b5 in hw-native-sys#1156, the pto-isa#166
mechanism) now exposes the templated overloads to __CPU_SIM as well, so
both backends can call the same explicit
MSCATTER<Coalesce::Elem, ScatterAtomicOp::None, ScatterOOB::Skip>.

Note the non-templated MSCATTER is still not portable: CPU sim defaults
to Coalesce::Elem while a5 onboard defaults to Coalesce::Row, so the
explicit templated form is the only instruction unified across both.

Verified on --platform a5sim. Onboard call site is unchanged.

Closes hw-native-sys#1159
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d4c1150b-4673-40bb-a138-b5cac5fbb8d5

📥 Commits

Reviewing files that changed from the base of the PR and between abc62d8 and 340352a.

📒 Files selected for processing (1)
  • tests/st/a5/tensormap_and_ringbuffer/simt_basic/kernels/aiv/kernel_simt_scatter.cpp

📝 Walkthrough

Walkthrough

The SIMT scatter kernel now calls MSCATTER<Coalesce::Elem, ScatterAtomicOp::None, ScatterOOB::Skip> unconditionally. The prior __CPU_SIM branch was removed, and the nearby comment was updated.

Changes

SIMT scatter cleanup

Layer / File(s) Summary
Unify MSCATTER invocation
tests/st/a5/tensormap_and_ringbuffer/simt_basic/kernels/aiv/kernel_simt_scatter.cpp
The kernel replaces the backend-specific scatter call with one templated MSCATTER invocation and updates the accompanying comment.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

  • hw-native-sys/simpler#764: Also changes the same SIMT scatter kernel’s MSCATTER call site to use the templated form consistently.

Poem

A rabbit hops by, ears held high,
One scatter call beneath the sky.
No forked path now, just one bright hop,
Templated steps that never stop.
🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: unifying the scatter kernel on templated MSCATTER and removing the __CPU_SIM fork.
Description check ✅ Passed The description directly matches the implemented cleanup and explains the sim/onboard templated MSCATTER unification.
Linked Issues check ✅ Passed The change fulfills #1159 by removing the __CPU_SIM branch and using the explicit templated MSCATTER on the kernel call site.
Out of Scope Changes check ✅ Passed The patch stays focused on the scatter kernel cleanup and adds only explanatory comments, with no unrelated code changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request simplifies the MSCATTER implementation in kernel_simt_scatter.cpp by removing the #ifdef __CPU_SIM conditional compilation fork. This is now possible due to updates in the pto-isa backend that allow the templated MSCATTER call to function consistently across both CPU simulation and NPU hardware backends. I have no feedback to provide as there were no review comments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Code Health] simt_basic scatter kernel: drop the __CPU_SIM fork — use one templated MSCATTER across sim and onboard

1 participant