Skip to content

[Code Health] simt_basic scatter kernel: drop the __CPU_SIM fork — use one templated MSCATTER across sim and onboard #1159

Description

@ChaoZheng109

Category

Technical Debt (cleanup, refactor)

Component

Tests

Description

kernel_simt_scatter.cpp (the SIMT element-scatter ST kernel) currently
forks the scatter call on __CPU_SIM:

#ifdef __CPU_SIM
    MSCATTER(outGlobal, srcTile, idxTile);                                   // non-templated
#else
    MSCATTER<Coalesce::Elem, ScatterAtomicOp::None, ScatterOOB::Skip>(...);  // templated
#endif

The fork existed because pto-isa previously gated the templated MSCATTER
overloads behind PTO_NPU_ARCH_A5 only — they were not visible to the CPU
simulator, so the sim path had to fall back to the non-templated form (whose
CPU-sim default happens to be Coalesce::Elem, matching our element-scatter
golden), while onboard selected Coalesce::Elem explicitly. See
pto-isa#164.

The pinned pto-isa (bumped to 016396b5 in #1156, the
pto-isa#166 mechanism)
now opens the templated overloads to __CPU_SIM as well as
PTO_NPU_ARCH_A5:

// build/pto-isa/include/pto/common/pto_instr.hpp:2049
#if defined(PTO_NPU_ARCH_A5) || defined(__CPU_SIM)
template <Coalesce Mode, ScatterAtomicOp Atomic, ScatterOOB Oob, ...>
PTO_INST RecordEvent MSCATTER(...) { ... MSCATTER_IMPL<Mode, Atomic, Oob>(...); }
#endif

So the same explicit templated call now compiles and runs identically on both
backends, and the #ifdef __CPU_SIM fork can be removed.

Important caveat — the non-templated form is still NOT portable. Only the
templated overload was unified. The non-templated MSCATTER(dst, src, idx)
still dispatches to each backend's own default MSCATTER_IMPL:

  • CPU sim default → Coalesce::Elem (pto/cpu/MScatter.hpp:139)
  • a5 onboard default → Coalesce::Row (pto/npu/a5/MScatter.hpp:456)

i.e. the original #164 divergence persists for the non-templated surface. The
single portable instruction must therefore be the explicit
MSCATTER<Coalesce::Elem, ScatterAtomicOp::None, ScatterOOB::Skip>.

Location

  • tests/st/a5/tensormap_and_ringbuffer/simt_basic/kernels/aiv/kernel_simt_scatter.cpp:85-89

Proposed Fix

Drop the #ifdef __CPU_SIM branch and call the explicit templated form
unconditionally on both sim and onboard:

MSCATTER<Coalesce::Elem, ScatterAtomicOp::None, ScatterOOB::Skip>(outGlobal, srcTile, idxTile);

Verified on --platform a5sim (1 passed). The onboard call site is
unchanged by this cleanup (it already used this exact instruction), so a5
behavior is unaffected; an a5 onboard rerun is still warranted to close the
loop (not available on the current a2a3 dev box).

Priority

Low (no impact today, good to fix eventually)

Metadata

Metadata

Assignees

Labels

code healthTechnical debt, robustness, code quality

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions