Skip to content

[ttl] Allocate pipe resources by transfer liveness#627

Draft
brnorris03 wants to merge 11 commits into
bnorris/pipe-transfer-irfrom
bnorris/pipe-transfer-liveness
Draft

[ttl] Allocate pipe resources by transfer liveness#627
brnorris03 wants to merge 11 commits into
bnorris/pipe-transfer-irfrom
bnorris/pipe-transfer-liveness

Conversation

@brnorris03
Copy link
Copy Markdown
Contributor

@brnorris03 brnorris03 commented May 24, 2026

Problem description

PR #624 introduces explicit Pipe Transfer IR, but physical synchronization resources are still allocated mostly by logical pipe or PipeNet identity. That means two same-source transfers consume distinct sender-ready counters and address-table slots even when their transfer lifetimes do not overlap.

In this PR, a transfer is live from the receive post (ttl.copy from pipe to destination DFB) that publishes the receiver-owned destination address until the send (ttl.copy from source DFB to pipe) that consumes that address and the corresponding sender-ready count. After that send, the source-core address-table slot and sender-ready counter can be reused even if the public transfer handle remains live until ttl.wait. Allocating by logical pipe keeps resource use tied to the number of logical pipes instead of the number of concurrently live transfers, and prevents the generated resource plan from expressing reuse when same-source transfers are serialized.

What's changed

This PR uses explicit Pipe Transfer IR lifetimes to allocate the source-core resources whose values are needed only between receive post (ttl.copy from pipe to destination DFB) and send (ttl.copy from source DFB to pipe). For example, this reduces the issue #625 reproducer compiler-managed pipe SRAM scratch from 160 bytes to 32 bytes by reusing non-overlapping address-table entries.

  • Allocates sender-ready counters from transfer live intervals instead of from logical pipe count.
  • Allocates receiver-authored source-core address-table slots from the same transfer live intervals.
  • Extracts shared live-interval allocation utilities so future resource allocators can reuse the same deterministic interval coloring logic.
  • Keeps receiver-completion counters per PipeNet in this PR.
  • Keeps user-facing ttl.create_pipe, ttl.copy, and ttl.wait behavior unchanged.
  • Preserves the PR [ttl] Use receiver-authored SRAM address tables for scalable PipeNet collectives(#620) #622 aggregate-ready protocol and the PR [ttl] Introduce pipe transfer IR #624 internal Pipe Transfer IR contract.
  • Adds a deterministic diagnostic for the same-block case where a later receive post would require more than one live receive post for the same logical pipe.
  • Clarifies that TT-Metal NoC multicast supports one destination L1 (SRAM) address for all receivers, so non-uniform collective destination addresses are rejected rather than tracked as future per-receiver multicast support.

This PR does not add phased lowering, receive-ahead transfer posting, DFB batching, or transfer grouping (future planned PRs, see below).

Tests

New tests cover same-source overlap and reuse in MLIR lowering, the queue-depth diagnostic for a second receive post before an intervening send, the updated non-uniform multicast destination diagnostic, and runtime Python lit regression coverage based on the dual-route ksplit PipeNet schedule in #625.

Stacked PR sequence

Pipe compilation work required to match Blaze's scalable communication model: GlobalSemaphore-backed counting, explicit L1 address/state, live-interval allocation, and batching where storage is limiting.

# PR Status Main problem solved
1 #614 Base PR Makes PipeNet transfers correct by switching to receiver-posted destination DFB addresses. This removes sender inference of receiver DFB write-pointer state, but not semaphore pressure.
2 #622 Base PR Removes collective receiver-count and source-local pipe-count semaphore growth by using receiver-authored SRAM address tables, compiler-emitted pipe resource plans, and GlobalSemaphore-backed ready counters. This eliminates the local semaphore-id scaling limit for large uniform collective transfers.
3 #624 Base PR Represents transfer phase, receiver-authored address publication, ready counting, send, and receive-token wait explicitly before control-flow-general lowering. This removes the need to infer transfer phase and queue depth from ttl.copy placement.
4 #627 Current PR Assigns source-core address-table slots and sender-ready counters from explicit pipe transfer lifetimes. This reduces those resources from logical pipe count to concurrently live same-source transfer count while keeping receiver completion per PipeNet.
5 TBD Planned follow-up Supports phased pipe transfer lowering for receive-ahead and pipelined loops with monotonic ready counts, finite address-table depths, and deterministic diagnostics when safe pipe transfer state cannot be allocated. This extends the current one-live-post protocol to programs where later receive posts may be live before earlier sends complete. Issue #623.
6 TBD Planned follow-up Lowers large all-to-all into communication batches when receiver DFB capacity is smaller than the number of incoming pipes. This eliminates the simultaneous receiver DFB slot scaling limit when the program can consume batches incrementally.
7 TBD Planned follow-up Groups post/wait state when pipe tokens prove identical lifetime and completion behavior. This reduces duplicate pipe synchronization resources after liveness allocation exists.

@brnorris03 brnorris03 changed the title Allocate pipe resources by transfer liveness [ttl] Allocate pipe resources by transfer liveness May 24, 2026
@brnorris03 brnorris03 force-pushed the bnorris/pipe-transfer-ir branch from 40f7c7d to c5c59fe Compare May 25, 2026 18:49
@brnorris03 brnorris03 force-pushed the bnorris/pipe-transfer-liveness branch from cc7254f to a24473f Compare May 25, 2026 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant