Skip to content

Compact PipeNet data-movement lowering to avoid NCRISC code-size overflow #628

@brnorris03

Description

@brnorris03

Problem

Large PipeNet schedules currently lower by duplicating data-movement send/receive bodies behind coordinate-specific predicates. On Wormhole, the issue #625 8x7 reproducer generates an NCRISC binary that exceeds the 16 KiB instruction-region limit before kernel launch:

ncrisc.elf: segment[0] [0xffc00000,+0x147f0) overflows region:0 limit of 0x4000 bytes

Observed size details from the reproducer:

  • NCRISC text segment: 0x147f0 bytes (83,952 bytes).
  • Wormhole NCRISC instruction region: 0x4000 bytes (16,384 bytes).
  • Generated post_receives_and_send.cpp: about 9,616 lines / 498 KB.
  • Generated code contains hundreds of coordinate predicates and noc_async_* calls.

This is not a PipeNet correctness failure by itself. The kernel fails during ELF load, before launch and before result verification can run.

Cause

PipeNet.if_src(...) / PipeNet.if_dst(...) expansion currently clones callback bodies per concrete pipe/role case. Lowering then emits coordinate-specific branches for the launch grid, so a schedule with many senders, receivers, and collective ranges scales generated data-movement code with the number of participating coordinates instead of with the logical transfer body.

Desired direction

Replace coordinate-unrolled PipeNet data-movement lowering with a compact representation, for example one of:

  • table-driven lowering where each core iterates over pipe records relevant to its coordinate;
  • per-node program specialization so each NCRISC binary contains only that node's send/receive work;
  • another representation that keeps generated code bounded by transfer logic rather than by full-grid coordinate expansion.

Splitting work across BRISC/NCRISC may reduce size by a constant factor, but does not solve the scaling problem by itself because Wormhole also has a 16 KiB BRISC region.

Validation

After compact lowering exists, rerun the full issue #625 reproducer on Wormhole with GRID_DIM=7 and exact output verification enabled. PR #622 keeps compile-only coverage for the 8x7 resource plan and uses the smaller original issue reproducer for Wormhole runtime coverage until this is fixed.

Related: #625, #622.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions