Skip to content

[ttl] Pipe verification; implement recv post -> send post -> send wait -> recv wait protocol (#608)#614

Open
brnorris03 wants to merge 32 commits into
mainfrom
bnorris/fix-pipe-unicast-608
Open

[ttl] Pipe verification; implement recv post -> send post -> send wait -> recv wait protocol (#608)#614
brnorris03 wants to merge 32 commits into
mainfrom
bnorris/fix-pipe-unicast-608

Conversation

@brnorris03
Copy link
Copy Markdown
Contributor

@brnorris03 brnorris03 commented May 21, 2026

Problem description

The PipeNet spec currently describes pipe transfer as if data is held in a logical in-transit buffer between the sender and receiver. The runtime protocol does not have such a separate buffer: the sender must write directly into a receiver-owned DFB slot. That means the receiver has to reserve a destination slot and publish its address before the sender can perform the transfer.

The previous lowering did not make that synchronization contract explicit. It treated pipe receives as a sender-side implementation detail, which let the compiler accept syntactically valid same-thread schedules that could deadlock at runtime. In addition, the previous implementation also regressed true unicast loops by adding synthetic DFB producer operations that were not balanced with the user's reserve/wait structure; this breaks the tt-lang contract with users that users are completely responsible for the data movement associated with pipes.

What's changed

  • Pipe receives were previously modeled as a sender-side implementation detail. They now use the user's reserved destination DFB slot: ttl.copy(pipe, dst) lowers to the receive post/address publication, and waiting on that copy waits for transfer completion.
  • Pipe sends previously assumed the transfer target could be derived without an explicit receiver post. They now respect the receiver-posted protocol for both unicast and multicast, including repeated transfers in loops.
  • Pipe synchronization state was previously shared too broadly across pipes and data-movement threads. It is now scoped so independent pipes and concurrent data-movement threads do not interfere with each other. The detailed lowering protocol is documented in docs/development/PipeNets.md.
  • Receive completion previously used state that did not advance correctly for repeated receives. It now uses per-PipeNet runtime counters, so repeated unicast and multicast receives in loops advance correctly across iterations.
  • The PipeNet verifier previously accepted same-thread pipe schedules that could deadlock. It now rejects those schedules and emits detailed diagnostics for common mistakes:
    • waiting for a receive before the send that completes it;
    • sending before the receiver has posted ttl.copy(pipe, dst).
  • Pipe receive waits were previously omitted from the wait-for graph when their guard could not be analyzed. They now have the same static guard requirement as the receive post, so ttl.wait on a pipe receive handle under an unanalyzable coordinate-dependent guard is rejected.
  • Internal receive lowering and verification previously allowed invalid pipe receive IR to fail later with less specific diagnostics. They now reject invalid receive IR earlier with clearer diagnostics.
  • Per-destination multicast receive addresses are tracked as future work in issue Support per-destination pipe receive addresses for multicast lowering #617.
  • Pipe tests previously encoded schedules that did not consistently publish receiver-owned destinations before sends. They now express valid receiver-posted schedules. Existing loopback multicast cases that did not need loopback were changed to exclude the source from the destination range.
  • Test coverage previously missed invalid same-thread schedules, unanalyzable receive-wait guards, internal pipe receive op verifier errors, true unicast loops, multicast loops, loopback multicast, unicast forwarding-chain patterns, and mesh-SPMD pipe execution. The PR adds negative pytest/lit coverage for the invalid cases and positive coverage for the valid cases, including the issue [bug] PipeLowering sender-lockstep reserve_back on pipe dst CB hangs under iteration #608 reproducer.

Tests

+--------------------------------------+----------------------+------------------+-----------------------------+
| Pattern / test area                  | Pipe type            | Size shape       | Control-flow covered        |
+--------------------------------------+----------------------+------------------+-----------------------------+
| true unicast split loop              | unicast 1 -> 1       | 1 tile           | 20-iter loop, same DM thread|
| gather                               | N unicast -> 1 dst   | 1 tile           | per-core if, receiver sum   |
| gather multiblock                    | N unicast -> 1 dst   | 1 x 2 tiles      | multi-tile DFB slots        |
| forward ring                         | unicast ring         | 1 tile           | separate DM send/recv       |
| row rings, grid=full                 | many unicast rings   | 1 tile           | full-grid guards            |
| pipe conv chain                      | unicast chain        | 1 x 2 tiles      | conditional send, pipeline  |
| row/col forwarding chains            | unicast chains       | 1 x 2 tiles      | 5-iter loops, row+col chain |
| scatter                              | multicast 1 -> N     | 1 tile           | non-loopback multicast      |
| broadcast_2d                         | multicast 1 -> 2D    | 1 tile           | loopback, same DM ordered   |
| all-to-all 1D/2D                     | multicast N -> N     | 1 tile           | loopback, reductions        |
| overlapping multicast                | multicast overlap    | 1 tile           | multiple senders per recv   |
| partial overlap                      | multicast overlap    | 1 x 2 tiles      | asymmetric recv counts      |
| gather + broadcast loop              | unicast + multicast  | multi-row block  | stripe loops, two PipeNets  |
| rendezvous state                     | unicast/multicast    | 1 tile           | same receiver/source cases  |
| cross-DFB loopback multicast         | multicast loopback   | 1 tile           | 2-iter loop, src/dst DFBs   |
| mcast matmul                         | row/col multicast    | 8 x 8 blocks     | K loops, large grid         |
| minimal matmul mirror                | row/col multicast    | 8 x 8 blocks     | K loops, tt-metal shape     |
| mesh SPMD unicast                    | unicast 1 -> 1       | per-device shard | multi-device mesh tensors   |
| captured/module/mixed-scope nets     | mixed PipeNet usage  | 1 tile           | Python scope variants       |
| invalid schedule tests               | unicast/multicast    | 1 tile           | wait-before-send errors     |
| invalid receive-wait guard           | unicast              | 1 tile           | unknown coord guard reject  |
+--------------------------------------+----------------------+------------------+-----------------------------+

Fixes #608.

phizalev-TT and others added 10 commits May 15, 2026 19:16
- Update tt-metal/tt-mlir versions and version checks.
- Create and validate the toolchain Python venv before tt-metal configure.
- Build tt-metal with source-local runtime roots and explicit firmware precompile.
- Install uplifted tt-metal runtime artifacts, descriptors, ttnn extensions, and precompiled firmware.
- Add build/install regression coverage.
@brnorris03 brnorris03 changed the title Bnorris/fix pipe unicast 608 [ttl] Pipe verification; refactor lowering May 21, 2026
@brnorris03 brnorris03 force-pushed the bnorris/fix-pipe-unicast-608 branch from 16cf0bf to 40f7aeb Compare May 21, 2026 22:33
@brnorris03 brnorris03 force-pushed the bnorris/fix-pipe-unicast-608 branch 2 times, most recently from 6d26d3e to 490fcb7 Compare May 22, 2026 03:58
@brnorris03 brnorris03 force-pushed the bnorris/fix-pipe-unicast-608 branch from 490fcb7 to 2b6df80 Compare May 22, 2026 03:59
Allocate pipe synchronization state from a shared runtime layout instead of deriving semaphore IDs directly from the PipeNet ID.

Receiver completion semaphores remain per PipeNet. Sender-ready and mailbox semaphores are allocated per pipe on each source core, so distinct pipes from the same source do not alias. Receive-post mailbox staging is allocated per NOC data-movement thread to avoid BRISC/NCRISC races before the remote SRAM write consumes the staged address.

Update lowering tests and PipeNet documentation to match the posted-receive protocol and the implemented semaphore layout.
Make pipe receive expansion validate before mutating IR, remove stale post-expansion receive-copy handling, and tighten internal pipe receive op verification.

Stabilize PipeGraph slot assignment by sorting on the complete pipe key, improve duplicate receiver diagnostics, and stop PipeNet guard verification after unknown PipeNet references.

Add regression coverage for internal receive op verifier diagnostics and update invalid guard expectations for the cleaner unknown-PipeNet diagnostic.
@brnorris03 brnorris03 marked this pull request as ready for review May 22, 2026 06:27
@brnorris03 brnorris03 changed the title [ttl] Pipe verification; refactor lowering [ttl] Pipe verification; implement recv post -> send post -> send wait -> recv wait protocol. May 22, 2026
Copy link
Copy Markdown
Contributor

@zoecarver zoecarver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to see the compiler-inserted reserves going away :)

@brnorris03 brnorris03 changed the title [ttl] Pipe verification; implement recv post -> send post -> send wait -> recv wait protocol. [ttl] Pipe verification; implement recv post -> send post -> send wait -> recv wait protocol (#608) May 22, 2026
LogicalResult
matchAndRewrite(PipeRecvPostOp op, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const override {
return lowerPipeRecvPost(op, adaptor.getPipe(), op.getDst(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be adaptor.getDst?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Updated both PipeRecvPostLowering and PipeRecvWaitLowering.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we do need op here -- adaptor no longer has the attributes which I still need -- ttl.cb_reserve, ttl.attach_cb, and slice offset.

int64_t cbIndex;
};

CopyOp findDefiningCopy(Value value) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filter pip copies like findPipeReceiveCopy does?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! Extracted the filter and reused it, also made the helper return optional.

- Reject non-uniform multicast receive addresses until per-destination addresses are supported
- Diagnose semaphore id over-allocation before TTKernel emission
- Add tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] PipeLowering sender-lockstep reserve_back on pipe dst CB hangs under iteration

3 participants