test(simt): unify scatter kernel on templated MSCATTER, drop __CPU_SIM fork#1160
test(simt): unify scatter kernel on templated MSCATTER, drop __CPU_SIM fork#1160ChaoZheng109 wants to merge 1 commit into
Conversation
… MSCATTER The simt_basic scatter kernel forked the scatter call on __CPU_SIM because pto-isa previously gated the templated MSCATTER overloads behind PTO_NPU_ARCH_A5 only, so the CPU simulator had to use the non-templated form. The pinned pto-isa (bumped to 016396b5 in hw-native-sys#1156, the pto-isa#166 mechanism) now exposes the templated overloads to __CPU_SIM as well, so both backends can call the same explicit MSCATTER<Coalesce::Elem, ScatterAtomicOp::None, ScatterOOB::Skip>. Note the non-templated MSCATTER is still not portable: CPU sim defaults to Coalesce::Elem while a5 onboard defaults to Coalesce::Row, so the explicit templated form is the only instruction unified across both. Verified on --platform a5sim. Onboard call site is unchanged. Closes hw-native-sys#1159
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe SIMT scatter kernel now calls ChangesSIMT scatter cleanup
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request simplifies the MSCATTER implementation in kernel_simt_scatter.cpp by removing the #ifdef __CPU_SIM conditional compilation fork. This is now possible due to updates in the pto-isa backend that allow the templated MSCATTER call to function consistently across both CPU simulation and NPU hardware backends. I have no feedback to provide as there were no review comments.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
What
Remove the
#ifdef __CPU_SIMfork in the SIMT element-scatter ST kernel(
kernel_simt_scatter.cpp)so both the CPU simulator and a5 onboard issue the same instruction:
MSCATTER<Coalesce::Elem, ScatterAtomicOp::None, ScatterOOB::Skip>(outGlobal, srcTile, idxTile);Why
The fork existed because pto-isa previously gated the templated
MSCATTERoverloads behind
PTO_NPU_ARCH_A5only — they weren't visible to the CPUsimulator, so the sim path had to fall back to the non-templated form
(pto-isa#164). The pinned pto-isa (bumped to
016396b5in #1156, thepto-isa#166 mechanism) now opens the templated overloads to
__CPU_SIMaswell as
PTO_NPU_ARCH_A5, so the explicit templated call compiles and runsidentically on both backends.
Caveat — non-templated form is still not portable
Only the templated overload was unified. The non-templated
MSCATTER(dst, src, idx)still picks each backend's own default:Coalesce::Elemon CPU sim (pto/cpu/MScatter.hpp) vsCoalesce::Rowona5 onboard (
pto/npu/a5/MScatter.hpp). So the explicitMSCATTER<Coalesce::Elem, ...>is the only instruction that behaves the sameon both — hence keeping the template arguments rather than relying on the
non-templated default.
Testing
--platform a5sim:1 passed.exact instruction), so a5 behavior is unaffected. An a5 onboard rerun is
still warranted to close the loop — not available on the current a2a3 dev
box.
Closes #1159