Problem
Large PipeNet schedules currently lower by duplicating data-movement send/receive bodies behind coordinate-specific predicates. On Wormhole, the issue #625 8x7 reproducer generates an NCRISC binary that exceeds the 16 KiB instruction-region limit before kernel launch:
ncrisc.elf: segment[0] [0xffc00000,+0x147f0) overflows region:0 limit of 0x4000 bytes
Observed size details from the reproducer:
- NCRISC text segment:
0x147f0 bytes (83,952 bytes).
- Wormhole NCRISC instruction region:
0x4000 bytes (16,384 bytes).
- Generated
post_receives_and_send.cpp: about 9,616 lines / 498 KB.
- Generated code contains hundreds of coordinate predicates and
noc_async_* calls.
This is not a PipeNet correctness failure by itself. The kernel fails during ELF load, before launch and before result verification can run.
Cause
PipeNet.if_src(...) / PipeNet.if_dst(...) expansion currently clones callback bodies per concrete pipe/role case. Lowering then emits coordinate-specific branches for the launch grid, so a schedule with many senders, receivers, and collective ranges scales generated data-movement code with the number of participating coordinates instead of with the logical transfer body.
Desired direction
Replace coordinate-unrolled PipeNet data-movement lowering with a compact representation, for example one of:
- table-driven lowering where each core iterates over pipe records relevant to its coordinate;
- per-node program specialization so each NCRISC binary contains only that node's send/receive work;
- another representation that keeps generated code bounded by transfer logic rather than by full-grid coordinate expansion.
Splitting work across BRISC/NCRISC may reduce size by a constant factor, but does not solve the scaling problem by itself because Wormhole also has a 16 KiB BRISC region.
Validation
After compact lowering exists, rerun the full issue #625 reproducer on Wormhole with GRID_DIM=7 and exact output verification enabled. PR #622 keeps compile-only coverage for the 8x7 resource plan and uses the smaller original issue reproducer for Wormhole runtime coverage until this is fixed.
Related: #625, #622.
Problem
Large PipeNet schedules currently lower by duplicating data-movement send/receive bodies behind coordinate-specific predicates. On Wormhole, the issue #625 8x7 reproducer generates an NCRISC binary that exceeds the 16 KiB instruction-region limit before kernel launch:
Observed size details from the reproducer:
0x147f0bytes (83,952 bytes).0x4000bytes (16,384 bytes).post_receives_and_send.cpp: about 9,616 lines / 498 KB.noc_async_*calls.This is not a PipeNet correctness failure by itself. The kernel fails during ELF load, before launch and before result verification can run.
Cause
PipeNet.if_src(...)/PipeNet.if_dst(...)expansion currently clones callback bodies per concrete pipe/role case. Lowering then emits coordinate-specific branches for the launch grid, so a schedule with many senders, receivers, and collective ranges scales generated data-movement code with the number of participating coordinates instead of with the logical transfer body.Desired direction
Replace coordinate-unrolled PipeNet data-movement lowering with a compact representation, for example one of:
Splitting work across BRISC/NCRISC may reduce size by a constant factor, but does not solve the scaling problem by itself because Wormhole also has a 16 KiB BRISC region.
Validation
After compact lowering exists, rerun the full issue #625 reproducer on Wormhole with
GRID_DIM=7and exact output verification enabled. PR #622 keeps compile-only coverage for the 8x7 resource plan and uses the smaller original issue reproducer for Wormhole runtime coverage until this is fixed.Related: #625, #622.