Parent: #619
Problem
Uniform multicast PipeNets currently contribute to the hardware semaphore pressure tracked in #619. The current lowering can allocate rendezvous resources per receiver or per copy site, even when all receivers publish an equivalent destination DFB address/offset.
Proposed lowering
For multicast cases where all receivers publish an equivalent destination DFB address/offset, lower the protocol like tt-metal multicast patterns:
- receivers reserve the destination DFB slot;
- receivers increment one sender-ready counter;
- sender waits for
num_dests;
- sender multicasts data to the uniform destination address;
- sender multicasts one receiver-valid/completion signal.
This should make semaphore usage proportional to multicast streams/epochs rather than receiver count or copy-site count.
References
Non-goals
This does not solve non-uniform destination DFB addresses. Those still need mailbox/table-based address publication.
Parent: #619
Problem
Uniform multicast PipeNets currently contribute to the hardware semaphore pressure tracked in #619. The current lowering can allocate rendezvous resources per receiver or per copy site, even when all receivers publish an equivalent destination DFB address/offset.
Proposed lowering
For multicast cases where all receivers publish an equivalent destination DFB address/offset, lower the protocol like tt-metal multicast patterns:
num_dests;This should make semaphore usage proportional to multicast streams/epochs rather than receiver count or copy-site count.
References
num_dests, multicasts data, then multicasts the receiver-valid signal:https://github.com/tenstorrent/tt-metal/blob/9938a888cc4efd766d7652c08ab7eeb8fedd9aaf/tt_metal/programming_examples/matmul/matmul_common/kernels/dataflow/reader_bmm_tile_layout_in0_sender_in1_sender.cpp#L127-L161
https://github.com/tenstorrent/tt-metal/blob/9938a888cc4efd766d7652c08ab7eeb8fedd9aaf/tt_metal/programming_examples/matmul/matmul_common/kernels/dataflow/reader_bmm_tile_layout_in0_receiver_in1_receiver.cpp#L76-L89
https://github.com/tenstorrent/tt-metal/blob/9938a888cc4efd766d7652c08ab7eeb8fedd9aaf/ttnn/cpp/ttnn/operations/conv/conv2d/device/conv2d_op_sharded_program_factory.cpp#L714-L724
https://github.com/tenstorrent/tt-metal/blob/9938a888cc4efd766d7652c08ab7eeb8fedd9aaf/ttnn/cpp/ttnn/operations/conv/conv2d/device/kernels/activation_reader_width_sharded.cpp#L207-L271
Non-goals
This does not solve non-uniform destination DFB addresses. Those still need mailbox/table-based address publication.