[ttl] Make TRID DMA wait lowering selectable (default: global barriers) by shutovilyaep · Pull Request #267 · tenstorrent/tt-lang

shutovilyaep · 2026-01-23T14:17:01Z

PR #267: [ttl] Make TRID DMA wait lowering selectable (default: global barriers)

Why

Current lowering uses global DMA barriers. TRID-scoped waits needed for #87 are not supported. Default behavior must stay unchanged for existing users.

What

New pass option use-trid-barriers (default off): choose global barriers (current) or TRID barriers.
When on: copy lowering emits set_trid; wait lowering emits barrier_with_trid. TRID slots (16) are tracked; reuse of an in-flight slot triggers an evict barrier first.
Pipeline and TTKernel cleanup forward the option; TRID dedup patterns run only in TRID mode.
Tests: conversion, translation, ME2E, and Python lit updated so both modes are covered (default path kept; TRID path added or enabled where needed).

How

Pass and pipeline accept the option and pass it through. In TRID mode, transfer handles become i32; SCF type conversions keep regions legal.
TridAllocator tracks 16 slots and direction; before reusing a busy slot the lowering emits the matching barrier. In global mode the allocator is unused (handle is 0).
Cleanup: TRID dedup is registered only when the option is true; dedup compares TRID and NOC.
Tests use the option explicitly where TRID output is checked; translation tests keep default RUN and add a second RUN with TRID checks where output differs.

shutovilyaep · 2026-01-26T16:19:00Z

Comment received from @brnorris03:

It would be great if you can implement this as a pass option so we can choose between different lowerings (there will probably be more optimizations later), keeping the default the same as what's in main now.

…getTileGridShapeFromValue The test used tensor<32x32xf32> (element type f32). Copy lowering calls getTileGridShapeFromValue() which asserts the tensor has TileType element type. Use tensor<1x1x!ttcore.tile<32x32,f32>> like other DMA tests to fix CI crash (SIGABRT) in TTLToTTKernel conversion. Attempt to fix CI failure in PR tenstorrent#267 / #1222.

shutovilyaep · 2026-01-30T13:39:54Z

/codeowners ping

brnorris03

Looks great, thank you! The only more significant issue I see is the lack of runtime tests, I think the best approach for now is to parameterize (some of) the test/me2e tests with the new option, what do you think? I can help with more concrete suggestions on how to do that if you agree.dd

Some general questions, mainly stemming from my lack of deep knowledge of the low-level semantics of the metal ops.

Is the TRID value semantically meaningful, or just needs to be unique per copy? I am guessing order doesn't matter? As defined the generated TRIDs could be nondeterministic (but correctly unique) due to parallel pattern application.
With the new ops requiring explicit NOC, I see that NOC 0 is always used -- is this appropriate or something that needs to be generalized (perhaps later PR)?

Again, thank you for contributing this!!

brnorris03 · 2026-01-30T15:02:55Z

  patterns.add<DeduplicateConsecutiveBarriers<NocAsyncReadBarrierOp>>(
      patterns.getContext());
  patterns.add<DeduplicateConsecutiveBarriers<NocAsyncWriteBarrierOp>>(
      patterns.getContext());
+  patterns
+      .add<DeduplicateConsecutiveTridBarriers<NocAsyncReadBarrierWithTridOp>>(
+          patterns.getContext());
+  patterns
+      .add<DeduplicateConsecutiveTridBarriers<NocAsyncWriteBarrierWithTridOp>>(
+          patterns.getContext());


Probably doesn't matter that much, but could make the relevant patterns conditional on the option that enables TRID?

fixed:

populateTTKernelCleanupPatterns now takes useTridBarriers (default false in header); TRID dedup patterns are only added when true.

Convert pass calls it with useTridBarriers so the option is forwarded to cleanup.

brnorris03 · 2026-01-30T15:04:53Z


+  let options = [
+    Option<"useTridBarriers", "use-trid-barriers", "bool", "false",
+           "Use TRID-aware DMA waits (barrier_with_trid) instead of global barriers.">,
+  ];


Thank you for adding the option! Not asking you to do this in the PR but it would be interesting to profile the different approaches with a small set of representative benchmarks and set the default based on that (perhaps add a short TODO to that effect here if you agree?).

Can't perform running, no device available

Added TODO in pass description: “Profile both modes on representative benchmarks and consider changing the default.”

brnorris03 · 2026-01-30T15:23:25Z

+class TridAllocator {
+public:
+  uint32_t allocateTrid() { return nextTrid++ & 0xF; }
+
+private:
+  uint32_t nextTrid = 0;
+};


There is wrapping at 16 TRIDs, but what happens if the 0th, etc are still not completed at that point? Is there any way to check/detect TRID overflow? Maybe add a TODO for future improvement to make this more robust.

fixed:

TridAllocator now tracks outstanding TRIDs and their direction. When a TRID would be reused while still in-flight, CopyLowering emits a barrier_with_trid for the old transfer before reassigning.

WaitLowering releases TRIDs via releaseTrid() so they can be reused without an auto-barrier.

Lit test (17 copies, no intervening waits) verifies auto-barrier on overflow.

shutovilyaep · 2026-02-23T08:22:30Z

@brnorris03 Hello, the fixes were ready about 3 weeks ago, haven't pushed due to being a little bit off due to sudden layoff from Tenstorrent. Will try to complete this today, if still relevant

shutovilyaep · 2026-02-26T12:22:29Z

Tried to implement awaiting for the DMA to complete instead to having a TODO when TRID ids are rotated.

Checked by locally running MLIR lit tests.

This file contains the PR description to copy-paste to GitHub. Will be excluded from final PR. Made-with: Cursor

shutovilyaep · 2026-02-27T09:57:17Z

@brnorris03 Hello, please take a look, squashed commits to atomic, should be mergeable

Local verification complete (macOS, no TT hardware)

cmake --build build --target check-ttlang

Test suite	Result
MLIR lit tests	63/63 passed
Python binding tests	4/4 passed

Additional targeted runs:

llvm-lit test/ttlang/Conversion/TTLToTTKernel/ - 12/12 passed
llvm-lit test/ttlang/Translate/TTLToCpp/ - 11/11 passed

No TT device available for ME2E or hardware execution tests. Ready for CI.

brnorris03

Apologies for taking so long with the re-review. I think overall looks great, my main concerns at the moment are about the test coverage, should be easily addressable. So sorry to learn about the layoff (if you don't mind, can you email me so I have your contact info).

brnorris03 · 2026-03-02T16:14:44Z

+class TridAllocator {
+public:
+  uint32_t allocateTrid() { return nextTrid++ & 0xF; }
+
+private:
+  uint32_t nextTrid = 0;
+};


This file contains the PR description to copy-paste to GitHub. Will be excluded from final PR.

brnorris03

Ready to land after some deprecated builder usage is updated (e.g., rewriter.create<arith::ConstantIntOp>(loc, 0, 8); should be arith::ConstantIntOp::create(rewriter, loc, 0, 8);, same for all rewriter.create calls).
Thank you!

Replace deprecated PatternRewriter::create factory usage with static Op::create(rewriter, loc, ...) for arith constants and TTKernel NOC barrier/set_trid ops (review feedback on PR tenstorrent#267).

Add use-trid-barriers option to convert-ttl-to-ttkernel pass and ttl-to-ttkernel-pipeline. When enabled, ttl.copy emits noc_async_{read,write}_set_trid before DMA operations, and ttl.wait emits noc_async_{read,write}_barrier_with_trid instead of global barriers. Default behavior (use-trid-barriers=false) preserves existing global barrier semantics from main branch. Key changes: - TridAllocator class manages 16 TRID slots with overflow handling - lowerTensorCBCopy unified function supports both modes - CopyLowering/WaitLowering patterns respect useTridBarriers flag - TTKernel cleanup patterns conditionally registered for TRID mode - SCF structural type conversions enabled for transfer handle types TODO: Profile both modes on representative benchmarks and consider changing the default.

- trid_barriers.mlir: Tests TRID-aware lowering with use-trid-barriers=true - Verifies noc_async_{read,write}_set_trid emission - Verifies noc_async_{read,write}_barrier_with_trid emission - Tests TRID overflow handling (17 copies without waits) - dma_global_barriers.mlir: Tests default global barrier mode - Verifies noc_async_{read,write}_barrier emission (no TRID) - Ensures backward compatibility with main branch behavior - Update existing tests to use explicit use-trid-barriers=true where they expect TRID-specific output

Enable use-trid-barriers in TTLToCpp translation tests that verify TRID-specific C++ codegen output. Tests now explicitly request TRID mode to match their expected noc_async_*_set_trid and barrier_with_trid output.

Add use_trid_barriers to E2EConfig and TestConfig to enable runtime testing of both barrier modes: - E2EConfig.use_trid_barriers controls pipeline pass option - TestConfig includes use_trid_barriers for test ID disambiguation - Pipeline builder forwards option to convert-ttl-to-ttkernel - Runner includes use_trid_barriers in kernel cache key - CONFIGS includes one TRID-enabled config for coverage Test IDs now include _trid suffix when use_trid_barriers=True to ensure unique pytest node IDs.

Update Python hardware execution tests to use use_trid_barriers=True for consistent TRID-mode testing. These tests exercise the full compilation and execution path with TRID-aware DMA barriers.

- Remove unused releaseTrid; use SmallVector + trailing underscore in TridAllocator - Replace tridAllocator check with assert; remove allocateTrid in non-TRID branch - Add emitNocBarrier helper; assert i32 for handle in WaitLowering - cb_to_tensor_single_tile_write: default RUN + TRID RUN with TRID: prefix - dma_loop_single_tile: relax CHECK for in-loop runtime arg variable - config_specs: add multi-tile config with use_trid_barriers=True Addresses: tenstorrent#87

Migrate TRID/barrier and constant op construction in ConvertTTLToTTKernel to the modern Op::create API requested in PR267 review.

shutovilyaep · 2026-05-13T05:50:41Z

@brnorris03 Pushed removing unneeded "makeZeroI32" with a history rewrite to remove it from git at all - noticed CI broke due to unused function, please take a look

shutovilyaep mentioned this pull request Jan 23, 2026

[ttl] Lower ttl.copy ttl.wait to TRID-specific ttkernel noc ops #87

Open

shutovilyaep marked this pull request as ready for review January 26, 2026 13:47

shutovilyaep requested a review from a team as a code owner January 26, 2026 13:47

shutovilyaep force-pushed the feat/lower_copy_wait branch 2 times, most recently from d754a28 to 42dbe50 Compare January 30, 2026 11:59

shutovilyaep changed the title ~~TTL: Lower async DMA waits to TRID barriers~~ [ttl] Make TRID DMA wait lowering selectable (default: global barriers) Jan 30, 2026

shutovilyaep force-pushed the feat/lower_copy_wait branch from fbe3c1d to cc01d5b Compare January 30, 2026 13:36

brnorris03 reviewed Jan 30, 2026

View reviewed changes

shutovilyaep force-pushed the feat/lower_copy_wait branch from cc01d5b to 1e5232e Compare February 24, 2026 14:08

shutovilyaep requested a review from brnorris03 February 24, 2026 14:46

shutovilyaep force-pushed the feat/lower_copy_wait branch 2 times, most recently from 6222a97 to e4f9ca4 Compare February 26, 2026 12:19

gloriouskilka pushed a commit to RedOrangeSweater/ML.TT.Lang that referenced this pull request Feb 27, 2026

[nfc] Add PR description for tenstorrent#267

b2aa437

This file contains the PR description to copy-paste to GitHub. Will be excluded from final PR. Made-with: Cursor

shutovilyaep force-pushed the feat/lower_copy_wait branch from e4f9ca4 to e01cb94 Compare February 27, 2026 09:50

shutovilyaep force-pushed the feat/lower_copy_wait branch 4 times, most recently from 3987a00 to 8bb8613 Compare February 27, 2026 17:19

shutovilyaep mentioned this pull request Feb 27, 2026

[CI Test] TRID DMA barrier lowering shutovilyaep/tt-lang#1

Closed

shutovilyaep force-pushed the feat/lower_copy_wait branch 3 times, most recently from 37c11ba to 2dba10e Compare February 27, 2026 17:44

brnorris03 reviewed Mar 2, 2026

View reviewed changes

gloriouskilka pushed a commit to RedOrangeSweater/ML.TT.Lang that referenced this pull request Mar 14, 2026

[nfc] Add PR description for tenstorrent#267

97402f8

This file contains the PR description to copy-paste to GitHub. Will be excluded from final PR.

shutovilyaep force-pushed the feat/lower_copy_wait branch 2 times, most recently from 01c7404 to 0a1c087 Compare March 17, 2026 15:40

shutovilyaep requested a review from brnorris03 March 17, 2026 15:46

shutovilyaep force-pushed the feat/lower_copy_wait branch from 0a1c087 to d91c351 Compare March 18, 2026 12:02

gloriouskilka pushed a commit to RedOrangeSweater/ML.TT.Lang that referenced this pull request Mar 26, 2026

[nfc] Add PR description for tenstorrent#267

7fae063

This file contains the PR description to copy-paste to GitHub. Will be excluded from final PR.

gloriouskilka pushed a commit to RedOrangeSweater/ML.TT.Lang that referenced this pull request Mar 26, 2026

[nfc] Add PR description for tenstorrent#267

4804be0

This file contains the PR description to copy-paste to GitHub. Will be excluded from final PR.

shutovilyaep force-pushed the feat/lower_copy_wait branch from d91c351 to e44ee28 Compare April 14, 2026 11:01

shutovilyaep force-pushed the feat/lower_copy_wait branch from e44ee28 to 490fc2f Compare May 5, 2026 05:01

brnorris03 approved these changes May 10, 2026

View reviewed changes

shutovilyaep force-pushed the feat/lower_copy_wait branch from 3da5e18 to 51b2e96 Compare May 13, 2026 03:59

shutovilyaep added 7 commits May 13, 2026 07:09

[test] Update translate tests for TRID barrier mode

cf626ff

Enable use-trid-barriers in TTLToCpp translation tests that verify TRID-specific C++ codegen output. Tests now explicitly request TRID mode to match their expected noc_async_*_set_trid and barrier_with_trid output.

[test] Enable TRID barriers in Python lit tests

f8da7a7

Update Python hardware execution tests to use use_trid_barriers=True for consistent TRID-mode testing. These tests exercise the full compilation and execution path with TRID-aware DMA barriers.

[ttl] Replace deprecated rewriter.create in copy lowering

def5a1f

Migrate TRID/barrier and constant op construction in ConvertTTLToTTKernel to the modern Op::create API requested in PR267 review.

shutovilyaep force-pushed the feat/lower_copy_wait branch from 51b2e96 to def5a1f Compare May 13, 2026 05:47

Merge branch 'main' into feat/lower_copy_wait

78ce070

Conversation

shutovilyaep commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR #267: [ttl] Make TRID DMA wait lowering selectable (default: global barriers)

Why

What

How

Uh oh!

shutovilyaep commented Jan 26, 2026

Uh oh!

shutovilyaep commented Jan 30, 2026

Uh oh!

brnorris03 left a comment

Choose a reason for hiding this comment

Uh oh!

brnorris03 Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

shutovilyaep Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

brnorris03 Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

shutovilyaep Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

brnorris03 Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

shutovilyaep Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brnorris03 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

shutovilyaep commented Feb 23, 2026

Uh oh!

shutovilyaep commented Feb 26, 2026

Uh oh!

shutovilyaep commented Feb 27, 2026

Uh oh!

brnorris03 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brnorris03 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brnorris03 left a comment

Choose a reason for hiding this comment

Uh oh!

shutovilyaep commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shutovilyaep commented Jan 23, 2026 •

edited

Loading

shutovilyaep Feb 24, 2026 •

edited

Loading