[ttl] Allocate pipe resources by transfer liveness by brnorris03 · Pull Request #626 · tenstorrent/tt-lang

brnorris03 · 2026-05-24T06:10:03Z

No description provided.

- Update tt-metal/tt-mlir versions and version checks. - Create and validate the toolchain Python venv before tt-metal configure. - Build tt-metal with source-local runtime roots and explicit firmware precompile. - Install uplifted tt-metal runtime artifacts, descriptors, ttnn extensions, and precompiled firmware. - Add build/install regression coverage.

…-2026

…s 608

…ast-608

Allocate pipe synchronization state from a shared runtime layout instead of deriving semaphore IDs directly from the PipeNet ID. Receiver completion semaphores remain per PipeNet. Sender-ready and mailbox semaphores are allocated per pipe on each source core, so distinct pipes from the same source do not alias. Receive-post mailbox staging is allocated per NOC data-movement thread to avoid BRISC/NCRISC races before the remote SRAM write consumes the staged address. Update lowering tests and PipeNet documentation to match the posted-receive protocol and the implemented semaphore layout.

Make pipe receive expansion validate before mutating IR, remove stale post-expansion receive-copy handling, and tighten internal pipe receive op verification. Stabilize PipeGraph slot assignment by sorting on the complete pipe key, improve duplicate receiver diagnostics, and stop PipeNet guard verification after unknown PipeNet references. Add regression coverage for internal receive op verifier diagnostics and update invalid guard expectations for the cleaner unknown-PipeNet diagnostic.

…ast-608

- Reject non-uniform multicast receive addresses until per-destination addresses are supported - Diagnose semaphore id over-allocation before TTKernel emission - Add tests

- Rename aggregate rendezvous lookup helper to match ReceiverDFBInfo. - Use dfbIndex and dfbType when constructing aggregate channel info.

Lower eligible uniform multicast pipes with a counted sender-ready rendezvous instead of per-pipe posted-address mailbox storage. Source-in-destination multicast derives the destination address from local receiver DFB state; safe non-loopback multicast uses a sender-local epoch counter plus static receiver slot metadata. Overlapping non-loopback multicast keeps the posted-address mailbox protocol because multiple reserve slots can be live. Also validate multicast receiver DFB uniformity before lowering, preserve semantic multicast kind through create_pipe, align host semaphore counting with C++ lowering, and document the resource model. Adds device, sim, and lit coverage for loopback, degenerate, non-loopback, all-to-all, overlap fallback, and semaphore-limit cases.

Refactor PipeNet channel lowering so ready counting and address storage are represented separately. Source-in-destination uniform multicast now uses aggregate rendezvous by waiting on one sender-ready count and reading the local receiver DFB address, while non-loopback multicast stays on the receiver-posted mailbox protocol until explicit receiver-authored address tables exist. Remove the non-loopback sender-local epoch reconstruction logic, since it inferred receiver DFB state from sender execution. Keep host-side semaphore accounting aligned with the implemented C++ resource model and restore deterministic PipeGraph slot assignment without reserve-slot metadata that is no longer used. Update PipeNets documentation and focused tests to cover source-in-destination aggregate lowering, non-loopback posted-mailbox fallback, and semaphore-count expectations. Validated with: cmake --build build; python -m pytest test/sim/test_operation_pipenets.py -q; llvm-lit -v build/test/ttlang/Dialect/TTL/Transforms/convert_pipe_ops.mlir; llvm-lit -v build/test/ttlang/Dialect/TTL/Transforms/convert_pipe_ops_invalid.mlir; Docker pytest test/python/pipe/test_pipenet_rendezvous.py -xvs -rxX.

- address storage carries receiver-authored DFB addresses; - ready counting records how many receivers have posted a transfer; - completion wait records when receiver-owned DFB storage contains the payload.

Update the tt-mlir submodule to the TTKernel change that allows remote_sram_write_u32 to use computed SRAM source addresses. This supports PipeNet address tables stored in ordinary SRAM instead of semaphore-backed mailbox words.

Uniform multicast now separates receiver-authored address storage from ready and completion synchronization. Receivers publish DFB write pointers to source-core SRAM address-table entries with TTKernel inline NOC writes, and senders consume those entries after the aggregate ready count instead of using semaphore-backed address mailbox words. Add hidden L1 scratch allocation and common-runtime-arg plumbing for the address tables, update host semaphore counting to match the compiler layout, and refresh MLIR, simulator, and hardware pytest coverage for non-loopback multicast and semaphore scaling.

Parameterize the backend-neutral fanout semaphore test over several recipient counts, including 50 recipients, to verify that a single multicast pipe keeps constant semaphore usage as destination count grows. Replace the fixed-row hardware fanout test with a grid=full variant that checks one receiver, a small fanout, and all device nodes except the source. The full-device case decomposes the all-but-source region into rectangular multicast pipes while still exercising receiver-authored SRAM address publication.

Rename the PipeNet rendezvous pytest to test_pipenet_sync.py for a shorter and clearer filename. Document that aggregate multicast rendezvous removes semaphore growth with destination count but does not remove receiver DFB capacity requirements for overlapping all-to-all arrivals.

Record compiler-owned PipeNet resource requirements with module attrs for local semaphores, GlobalSemaphore ready counters, and SRAM address-table storage. Lower receiver posts through receiver-authored SRAM address tables so address publication no longer consumes semaphore ids, and use GlobalSemaphore-backed ready counters when source-local pipes exceed the local semaphore budget. Thread the resource plan through Python runtime allocation, update host-side PipeNet accounting, and add focused kernel-runner, simulator, MLIR, and hardware pytest coverage for global ready counters and aggregate ready-counting behavior. Document the current lowering model and Device 2.0 transition points for resource binding.

Replace pre-TTKernel multicast classification with an explicit point-to-point vs collective transfer contract. The frontend now emits isCollective on ttl.create_pipe for slice-origin receiver sets, including degenerate one-receiver collectives, and PipeGraph/PipeLowering carry PipeTransferContract through resource planning instead of using hardware-oriented multicast terminology. Keep hardware multicast naming in TTKernel emission, where the physical NOC operation is selected independently from the semantic transfer contract.

Replace the cached-kernel GlobalSemaphore lifetime list instead of appending to it on every execution. This keeps the current call's semaphore objects alive without retaining stale semaphore objects across repeated kernel invocations. Add a Python-only runner test that executes a GlobalSemaphore-backed kernel twice and verifies the owner list remains bounded to the current allocation.

Advance tt-mlir to the fix that preserves noc_async_write_barrier after ttkernel.noc_inline_dw_write. Pipe receive posts rely on that barrier to publish receiver-authored address-table entries before incrementing sender-ready counters.

Use point-to-point and collective terminology for PipeNet transfer contracts before TTKernel lowering so semantic pipe contracts are not confused with hardware multicast lowering.\n\nAdd non-deprecated Python and C++ accessors for point-to-point/single-receiver and collective/multiple-receiver queries. Keep the old unicast/multicast accessors as deprecated compatibility aliases, while leaving TTKernel, profiler, and NOC hardware multicast terminology intact.\n\nUpdate PipeNet docs, validation diagnostics, and frontend lowering comments to use the new semantic wording.

Split pipe resource, address-table, and receiver-address checks into non-mutating preflight records before TTKernel emission. This keeps pipe send, post, and wait conversion patterns from returning failure after creating partial IR. Move tensor accessor and DFB rank validation before tensor/DFB copy emission, and switch PipeGraph construction to a typed walk that interrupts on the first receiver validation failure.

Refine pipe resource plan helpers and validation so address storage, ready counters, and completion wait resources are represented explicitly. Add runtime argument count validation for compiler-emitted pipe resource plans. Expand MLIR and Python coverage for semaphore spill boundaries, collective metadata, and pipe runtime resource diagnostics.

Stage ttl.constants in the tt-lang-sim wheel because ttl._pipenets imports the shared hardware semaphore limit. This fixes the wheel smoke import failure for ttl.sim after adding the shared PipeNet constants module.

Document the single-receiver collective pipe contract in TableGen. Use a distinct PipeSourceKey type for source-local ready-counter allocation. Share the Python ready-counter spill predicate between local and GlobalSemaphore counts.

Factor pipe SRAM scratch, GlobalSemaphore, runtime argument, semaphore descriptor, and io_tensors setup into reusable kernel_runner helpers. Make emitted runners import those helpers instead of duplicating the pipe runtime body.

Use Pipe Transfer IR intervals to reuse sender-ready counters and source-core address-table slots when transfer lifetimes do not overlap. Keep receiver completion per PipeNet. Add focused MLIR checks and a Python lit runtime reproducer for the dual-route PipeNet case.

phizalev-TT and others added 30 commits May 15, 2026 19:16

Fix ttlang-sim and ttlang-sim-stats in dist container

48de982

uplift

2c5285c

Merge tag 'v1.1.1' into bnorris/uplift-may-20-2026

9d6af2a

update experimental::CircularBuffer -> CircularBuffer

271c9c6

Merge remote-tracking branch 'origin/main' into bnorris/uplift-may-20…

9708d02

…-2026

precommit

2bde602

rewrite pipe lowering to remove compiler-generated dfb reserves; fixe…

113bdba

…s 608

precommit

d5612a0

Use receiver-posted DFB addresses for pipe lowering

e48dc53

[sim] Model pipe copies as transfer on copy (not wait).

ac0b08b

[doc] update doc with pipe semantics and example

1e29568

update tests

9f0eddf

Merge remote-tracking branch 'origin/main' into bnorris/fix-pipe-unic…

8764843

…ast-608

lit test updates

d595856

update doc

40f7aeb

Merge remote-tracking branch 'origin/main' into bnorris/fix-pipe-unic…

8ce0ac2

…ast-608

tighten verifier: reject unanalyzable pipe receive waits; add tests

02ae78a

clean up tests

12bf836

Merge remote-tracking branch 'origin/main' into bnorris/fix-pipe-unic…

7374479

…ast-608

update doc

2b6df80

precommit

96befa3

Merge branch 'main' into bnorris/fix-pipe-unicast-608

60c7980

Merge remote-tracking branch 'origin/main' into bnorris/fix-pipe-unic…

a1ff751

…ast-608

more cleanup

6b2341b

add issue 619 xfail pytest

972dc14

address comments

311f584

Reject invalid PipeNet rendezvous layouts

f4d74b9

- Reject non-uniform multicast receive addresses until per-destination addresses are supported - Diagnose semaphore id over-allocation before TTKernel emission - Add tests

brnorris03 added 29 commits May 23, 2026 08:20

initial aggregation works

04000f3

some renaming for clarity

b0c3441

Preserve multicast kind for aggregate pipe rendezvous

0d30d48

Fix PipeGraph DFB lookup names

4a4c009

- Rename aggregate rendezvous lookup helper to match ReceiverDFBInfo. - Use dfbIndex and dfbType when constructing aggregate channel info.

Lowering models three resources separately:

e8a3511

- address storage carries receiver-authored DFB addresses; - ready counting records how many receivers have posted a transfer; - completion wait records when receiver-owned DFB storage contains the payload.

Advance tt-mlir for computed SRAM writes

3be72df

Update the tt-mlir submodule to the TTKernel change that allows remote_sram_write_u32 to use computed SRAM source addresses. This supports PipeNet address tables stored in ordinary SRAM instead of semaphore-backed mailbox words.

precommit

d610c35

update python lit test

e667ae5

run ci for any PR, not just those targeting main

93a3680

Update tt-mlir inline write barrier handling

d9fa51c

Advance tt-mlir to the fix that preserves noc_async_write_barrier after ttkernel.noc_inline_dw_write. Pipe receive posts rely on that barrier to publish receiver-authored address-table entries before incrementing sender-ready counters.

update tt-mlir sha

7d20361

cleanup part 1

ddf9cf8

Package pipe constants in sim wheel

5aaed4d

Stage ttl.constants in the tt-lang-sim wheel because ttl._pipenets imports the shared hardware semaphore limit. This fixes the wheel smoke import failure for ttl.sim after adding the shared PipeNet constants module.

Refine pipe resource review cleanup

e6264eb

Document the single-receiver collective pipe contract in TableGen. Use a distinct PipeSourceKey type for source-local ready-counter allocation. Share the Python ready-counter spill predicate between local and GlobalSemaphore counts.

Share pipe runtime runner helpers

3b8da76

Factor pipe SRAM scratch, GlobalSemaphore, runtime argument, semaphore descriptor, and io_tensors setup into reusable kernel_runner helpers. Make emitted runners import those helpers instead of duplicating the pipe runtime body.

Introduce pipe transfer IR

5c9e387

Improve pipe transfer IR provenance checks

40f7c7d

brnorris03 closed this May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ttl] Allocate pipe resources by transfer liveness#626

[ttl] Allocate pipe resources by transfer liveness#626
brnorris03 wants to merge 61 commits into
mainfrom
bnorris/pipe-transfer-liveness

brnorris03 commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brnorris03 commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants