[fix][ttl] ttl-insert-cb-sync: Allow tensor SSA uses past the next-acquire boundary (#536) by brnorris03 · Pull Request #554 · tenstorrent/tt-lang

brnorris03 · 2026-05-08T06:13:25Z

Problem:
ttl-insert-cb-sync inserts cb_pop / cb_push after the last use of each acquired slot, but only searched up to the next same-class acquire on the same DFB. Tensor SSA uses can legitimately appear after a later acquire (a tile from cb_wait t1 may be consumed after cb_wait t2). Those uses were missed, the pop was inserted right after the acquire, and the read pointer advanced before the data was read. The case_a / case_b reproducers from the issue 536 follow-up comment miscompiled silently.

Independently, deferred-consume patterns (t1 = cb.wait(); t2 = cb.wait(); use(t1); use(t2)) lowered to N independent cb_wait_front(k) / cb_pop_front(k) calls. Because metal's wait/pop primitives are non-cumulative, the first pop advances the front before the producer has pushed enough tiles to satisfy the next read.

Fixes (~530 LOC):

Split findLastOwnedUse: direct CB uses keep the boundary, tensor SSA uses ignore it. Extend findOwnedReleases' upper bound to the last owned use.
New pass ttl-coalesce-dfb-acquires rewrites a maximal run of same-DFB acquires + their matched releases into the canonical tt-metal cumulative-wait shape (cb_wait_front(N*k) + per-block tensor.extract_slice views + cb_pop_front(N*k)). addSliceOffset already folds the slice offsets into the per-tile src_idx / dst_idx, so no lowering changes. Same template handles producer-side cb_reserve/cb_push. The detection rule is locally-checkable: an op between members terminates the group iff it touches our DFB or consumes a group member's result, or carries a region (the conservative envelope of "could trigger a release on the DFB"). This means matmul-style patterns where acquires on two DFBs are interleaved (a1, b1, a2, b2) coalesce per-DFB independently, since an acquire on a different DFB is benign. See docs/development/DFBManagement.md for the full correctness argument. Fixes [ttl] Coalesce consecutive cb_wait acquires into multi-tile cb_wait_front(N) with per-acquire src_idx #556.

Tests:

test/python/test_auto_pop_push.py: device coverage including the case_a / case_b reproducers, an interleaved-baseline regression guard, and axis-coverage cases (fanout, mixed immediate/deferred, long chains, multi-DFB, multi-tile geometry, tight block_count, single-slot, long DM loops, producer/consumer in scf.for, multiple direct DFB uses on one acquire). The reordered-consume case (formerly xfail [ttl] Coalesce consecutive cb_wait acquires into multi-tile cb_wait_front(N) with per-acquire src_idx #556) now runs as a passing test.
test/ttlang/Dialect/TTL/Transforms/insert_cb_sync.mlir: IR-level coverage of the deferred-use pattern for compute-thread consumers (flat and inside scf.for) and producers, plus a regression guard for the scf.for iter_args pattern.
test/ttlang/Dialect/TTL/Transforms/coalesce_dfb_acquires.mlir: positive (consumer / producer / loop-body / matmul-style two-DFB interleaved), negative (interleaved consume, single-acquire-per-DFB alternation, pre-existing num_tiles), and idempotency coverage for the new pass.
docs/development/DFBManagement.md: extended "DFB Sync Insertion" with the ownership criteria, release-placement invariants, and idempotency.

Future work:

[ttl] Encode DFB ownership in SSA on ttl.copy operands #555: encode DFB ownership in SSA on ttl.copy operands. Would collapse findLastOwnedUse to a single SSA walk and lift the DM-thread xfails. Requires (mechanical) updates to a large number of lit tests.

Fixes #536 follow-up. Fixes #556.

…-fixes

… acquires + N matching releases into the canonical tt-metal cumulative-wait shape (`cb_wait_front(N*k)` + per-block `tensor.extract_slice` views + `cb_pop_front(N*k)`). `addSliceOffset` already folds the slice offsets into the per-tile `src_idx` / `dst_idx`, so no lowering changes. Symmetric on the producer side. Fixes #556.

extend correctness argument and idempotency notes in DFBManagement.md; add adversarial pytests (matmul-style two-DFB interleave, t1-fanout, interposed third acquire) and matching lit coverage; rename CB->DFB in new content.

ttl-auto-sync pipeline; update C++, Python, and me2e callers to use it. Remove helper used just once.

…-fixes

zoecarver

LGTM!

…-fixes

arichinsTT

Looks good!

brnorris03 added 3 commits May 7, 2026 22:27

fix lastOwnedUse computation

3416b30

update doc

21ca0f3

add tests

b283893

brnorris03 changed the title ~~fix][ttl] Allow tensor SSA uses past the next-acquire boundary~~ [fix][ttl] Allow tensor SSA uses past the next-acquire boundary May 8, 2026

brnorris03 changed the title ~~[fix][ttl] Allow tensor SSA uses past the next-acquire boundary~~ [fix][ttl] ttl-insert-cb-sync: Allow tensor SSA uses past the next-acquire boundary (#536) May 8, 2026

brnorris03 added 2 commits May 8, 2026 08:00

more tests and cleanup

974085b

replace weird "shape" usage in comments/doc

7e02bd5

brnorris03 marked this pull request as ready for review May 8, 2026 15:40

brnorris03 requested a review from a team as a code owner May 8, 2026 15:40

brnorris03 added 9 commits May 8, 2026 08:52

Merge remote-tracking branch 'origin/main' into bnorris/auto-push-pop…

ea44b09

…-fixes

Merge remote-tracking branch 'origin/main' into bnorris/auto-push-pop…

6be962d

…-fixes

xfail the issue 541 (fixed in 547) tests

49b66ce

Bundle ttl-insert-cb-sync + ttl-coalesce-dfb-acquires into a registered

b34b80a

ttl-auto-sync pipeline; update C++, Python, and me2e callers to use it. Remove helper used just once.

edit comments

bf22919

Merge remote-tracking branch 'origin/main' into bnorris/auto-push-pop…

6ba78d1

…-fixes

add missing tests

5d02c98

zoecarver approved these changes May 12, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into bnorris/auto-push-pop…

e1cb5ee

…-fixes

arichinsTT approved these changes May 12, 2026

View reviewed changes

brnorris03 enabled auto-merge (squash) May 12, 2026 16:34

brnorris03 merged commit 5368cce into main May 12, 2026
13 checks passed

brnorris03 deleted the bnorris/auto-push-pop-fixes branch May 12, 2026 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][ttl] ttl-insert-cb-sync: Allow tensor SSA uses past the next-acquire boundary (#536)#554

[fix][ttl] ttl-insert-cb-sync: Allow tensor SSA uses past the next-acquire boundary (#536)#554
brnorris03 merged 15 commits into
mainfrom
bnorris/auto-push-pop-fixes

brnorris03 commented May 8, 2026 •

edited

Loading

Uh oh!

zoecarver left a comment

Uh oh!

arichinsTT left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

brnorris03 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zoecarver left a comment

Choose a reason for hiding this comment

Uh oh!

arichinsTT left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brnorris03 commented May 8, 2026 •

edited

Loading