Conversation
Investigates carrying MLIR Location info through mlir-aie. Today every
IRON-built op gets Location.unknown() and ~136 builder.getUnknownLoc()
sites in the C++ passes drop locations as ops are synthesized. This
patch demonstrates an end-to-end path from a user's Python program
down through one transform pass.
- python/iron/_loc.py: capture_user_loc(name=...) walks Python frames
past IRON / mlir-aie / venv internals and returns
NameLoc(name, FileLineColLoc(...)). Returns None when no MLIR Context
is active so callers fall back to Location.unknown().
- python/iron/{dataflow/objectfifo,worker,kernel,runtime/runtime,
runtime/dmatask}.py: capture _user_loc at IRON construction (or at
the Runtime.fill / drain call site) and apply it via
`with loc_or_unknown(self._user_loc):` around MLIR op creation in
resolve().
- lib/Dialect/AIE/Transforms/AIEObjectFifoStatefulTransform.cpp:
forward op.getLoc() instead of builder.getUnknownLoc() in
createObjectFifo, createObjectFifoLocks, and the BufferOp creation
inside createObjectFifoElements -- so locks and buffers materialized
from an objectfifo inherit the user's source position.
- docs/SourceLocations.md: investigation summary, current state,
three-phase plan, and effort estimate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to commit efca7bd. Two changes that make the prototype actually work end-to-end and stay working: 1. Eager frame capture in `_loc.py`. The previous version called `Location.file()` from inside `__init__`, but IRON construction runs *before* `Program.resolve_program()` enters `mlir_mod_ctx()`, so there was no MLIR `Context` and `capture_user_loc()` returned None for every op. Now it captures `(filename, line, col)` into a small `CapturedLoc` dataclass at construction time and materializes the real `ir.Location` lazily inside the `with` block in `resolve()`. 2. A small fix to the C++ pass change: the previous `Location loc = {}` default with `loc ? loc : builder.getUnknownLoc()` doesn't compile because `mlir::Location` has no default constructor or `operator bool()`. Made the loc parameter required and updated the one caller (the FIFO-split site). Tests added (all passing): - `test/python/loc_capture.py` — unit test for `capture_user_loc` / `CapturedLoc.materialize` / `loc_or_unknown`. - `test/python/iron_loc_emit.py` — builds a small IRON program and asserts that ObjectFifo / Worker / Kernel / Runtime.fill / drain ops carry `loc("of_in"(this_file:line))` etc. through the emitted module. - `test/objectFifo-stateful-transform/base/loc_preservation.mlir` — hand-written MLIR with an explicit `loc(#user_loc)` on the objectfifo.create; pipes it through `aie-opt --aie-objectFifo-stateful-transform --mlir-print-debuginfo` and FileChecks that the synthesized lock / buffer ops reference the same source location alias. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… sites Removes the rest of the `builder.getUnknownLoc()` calls from this pass (38 → 0). Every op the stateful-transform synthesizes from an `aie.objectfifo` now inherits a useful location: - locks, buffers, the new objectfifo from a split — already done in the prior commit, kept. - `createBd` widened to take a `Location` (so the DMA body — `use_lock`, `dma_bd`, `dma_bd_packet`, `next_bd` — all share the source op's loc). - mem / shim_dma / mem_tile_dma terminator + `dma_start` ops use the ObjectFifoCreateOp's loc. - packet flow / packet source / packet dest / flow / shim_dma_allocation use producer's or consumer's loc, whichever is contextually correct. - dynamic-objectfifo helper uses the surrounding core / acquire / release / access op's loc when synthesizing `arith.constant`, `memref.load`, `memref.store`, `arith.index_cast`, `scf.yield`, the global next-index buffer, etc. - `findOrCreateTile` uses the host tile's loc for the new TileOp. All 131 existing stateful-transform lit tests still pass (1 expectedly failed, pre-existing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Continues threading source locations through the back half of the pass pipeline. AIECoreToStandard.cpp (13 → 1 getUnknownLoc): the lowering patterns that emit `func.call` for `aie.put_stream` / `aie.get_stream` / `aie.put_cascade` / `aie.get_cascade` / `aie.use_lock` / `aie.debug`, the `memref.global` / `memref.get_global` / `memref.assume_alignment` trio for `aie.buffer`, and the `func.func` / `func.return` / event trace calls all now use the source op's location. The one remaining `builder.getUnknownLoc()` is the `declareAIEIntrinsics` site, which emits LLVM-intrinsic `func.func` declarations that don't correspond to a user-written op. AIEToConfiguration.cpp (5 → 0): `emitTransactionOps`, `emitControlPacketOps`, and `convertTransactionOpsToMLIR` now thread a `Location` parameter through (sourced from the parent `aie.device`), so the synthesized `npu.write32` / `npu.maskwrite32` / `npu.blockwrite` / `npu.control_packet` ops at the runtime-config layer carry the device's location instead of `unknown`. The `aiex.runtime_sequence` created by `convertAIEToConfiguration` when none exists also gets the device's location. Together with the previous commit, this leaves only ~75 of the original 136 `getUnknownLoc()` sites in the codebase — the remaining ones are in passes outside the IRON → ObjFifo → core/runtime path (e.g. AIEVecToLLVM, AIECreatePathFindFlows, AIEHerdRouting). All 227 existing tests under test/Conversion and test/objectFifo-stateful-transform still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…extend IRON test
AIEHerdRouting.cpp (6 → 0): use the source HerdOp's location for the
synthesized IterOp / SelectOp / SwitchboxOp / ConnectOp when expanding
herd routes.
AIECreatePathFindFlows.cpp (19 → 0): in `addConnection` use
`flowOp.getLoc()` (the original FlowOp being lowered); in
`runOnPacketFlow` use the per-tile `tile.getLoc()` for the SwitchboxOp /
AMSelOp / MasterSetOp / PacketRulesOp / PacketRuleOp it generates per
tile, and `tileOp.getLoc()` for the shimmux ConnectOps; in the
wire-creation loop at the bottom of `runOnOperation`, use the loop's
`tile.getLoc()` for every WireOp.
Extends `test/python/iron_loc_emit.py` with a second RUN line that
pipes the IRON-emitted module through
`aie-opt --aie-place-tiles --aie-objectFifo-stateful-transform
--mlir-print-debuginfo` and FileChecks that the synthesized
buffers / locks / dma_bd ops still reference the same NameLoc alias
(`loc("of_in"(...))` / `loc("of_out"(...))`) tied back to the user's
Python source. This is the first regression-tested demonstration of
the full Python -> dialect -> pass chain preserving locations.
All 255 lit tests under Conversion/, objectFifo-stateful-transform/,
find-flows/, create-flows/, plus the new python/loc_capture.py and
python/iron_loc_emit.py still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… sites Records the current state on the `location` branch: - Phase 1 (Python capture): fully implemented, 3 passing lit tests. - Phase 2 (pass plumbing): IRON path complete (5 passes converted, ~80 of 136 sites done), other passes pending. - Phase 3 (backend / DWARF): unstarted but unblocked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 sweep continues, completing the audit of passes outside the IRON path. Each site replaced uses the source op's location (the op being lowered, the originating broadcast/multicast/cascade/control op, or a contextually-relevant tile / device op). Files updated: - AIE/Transforms/AIEObjectFifoRegisterProcess.cpp (10 → 0): for-loop bounds, acquires/releases/subview accesses, kernel CallOp, and the CoreOp/EndOp synthesized for tiles missing one — all now use the source ObjectFifoRegisterProcessOp's loc. - AIEX/Transforms/AIECreateCores.cpp (9 → 0): TileOp / BufferOp / MemOp / EndOp / CoreOp / arith.constant / memref.load synthesized for each callOp now use callOp.getLoc(). - AIEX/Transforms/AIELowerMemcpy.cpp (6 → 0): widened createDMABlocksAndOps to take a Location; all dma_start / use_token / dma_bd / next_bd / FlowOp now use the MemcpyOp's loc. - AIE/Transforms/AIEGenerateColumnControlOverlay.cpp (5 → 0): widened createPacketFlowOp to take a Location; packet_flow / packet_source / packet_dest / aie.end and shim_dma_allocation use the control tile's loc. - AIEX/Transforms/AIECreateBroadcastPacket.cpp (4 → 0): packet_flow / packet_source / aie.end use the source BroadcastPacketOp's loc; packet_dest uses the BPDestOp's loc. - AIE/IR/AIEDialect.cpp (1 → 0): TileOp synthesized in the dialect's helper uses the device's loc. - AIE/Transforms/AIECanonicalizeDevice.cpp (1 → 0): synthesized DeviceOp uses the module's loc. - AIE/Transforms/AIELocalizeLocks.cpp (1 → 0): localized lock-id constant uses the source LockOp's loc. - AIE/Transforms/AIELowerCascadeFlows.cpp (1 → 0): ConfigureCascadeOp uses the tile's loc. - AIEX/Transforms/AIECreateLocks.cpp (1 → 0): synthesized LockOp uses the tile's loc. - AIEX/Transforms/AIECtrlPacketToDma.cpp (1 → 0): NpuDmaMemcpyNdOp uses the device's loc (already in scope as `loc`). - AIEX/Transforms/AIEExpandLoadPdi.cpp (1 → 0): the synthesized empty DeviceOp/EndOp/NpuLoadPdiOp now use loadPdiOp's loc. - AIEX/Transforms/AIELowerMulticast.cpp (1 → 0): synthesized FlowOp uses the MultiDestOp's loc. Intentionally left: - AIEVecToLLVM.cpp (8): all sites are LLVM-intrinsic FuncOp *declarations* and helper-wrapper bodies — same category as declareAIEIntrinsics in AIECoreToStandard. They don't correspond to user code; the actual op lowering preserves location through the standard rewriter pattern. - AIEPathFinder.cpp (5): DynamicTileAnalysis helpers that synthesize Tile/Switchbox/ShimMux infrastructure on demand; the analyzer has no direct access to a source op. - AIECoreToStandard.cpp (1): declareAIEIntrinsics, intentionally left in the prior commit. - AIEToConfiguration.cpp (1): convertTransactionBinaryToMLIR creates a brand-new ModuleOp from a parsed binary blob — there is no source MLIR op to point to. Tally: ~136 → 15 sites total (89% reduction). The 15 remaining are infrastructure synthesis sites where UnknownLoc is appropriate. All 404 lit tests across Conversion/, find-flows/, create-flows/, create-locks/, canonicalize-device/, dialect/, CppTests/, objectFifo-stateful-transform/, and the new python tests still pass (2 expectedly failed pre-existing, 3 unsupported). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Records the final tally: 136 → 15 unknown-loc sites (89% reduction). The 15 remaining are all "no source op" cases (intrinsic declarations, analyzer-synthesized infrastructure, binary-disassembly module creation) where UnknownLoc is appropriate. Phase 3 (backend / DWARF / aiecc tooling) is unstarted but unblocked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three medium-severity items raised in review:
1. CapturedLoc.__enter__ raised FrozenInstanceError (python/iron/_loc.py).
Switched to object.__setattr__ for the transient _active slot so the
helper actually works as a `with captured:` context manager. Added a
CHECK-6 case to test/python/loc_capture.py exercising the direct form.
2. Frontend capture coverage was incomplete. Added eager capture +
`with loc_or_unknown(...)` in resolve() for the remaining IRON entry
points the previous commits missed:
- Buffer.__init__ / Buffer.resolve (python/iron/buffer.py)
- ObjectFifoLink.__init__ / ObjectFifoLink.resolve, with a synthesized
name `link_<srcs>_to_<dsts>` so split / join / forward call sites
show up clearly in MLIR (python/iron/dataflow/objectfifo.py)
- InlineOpRuntimeTask now stores user_loc; Runtime.inline_ops captures
it (python/iron/runtime/{task.py,runtime.py})
- _BarrierSetOp now stores user_loc; Runtime.set_barrier captures it
(python/iron/{worker.py,runtime/runtime.py})
New test test/python/iron_loc_emit_extras.py builds a program using
Buffer, ObjectFifoLink (via .forward()), Runtime.set_barrier and
Runtime.inline_ops, and FileChecks that NameLocs for `my_scratch` and
`link_of_in_to_of_mid` reach the emitted MLIR.
3. Per-pass test coverage was thin. Added one focused location-
preservation lit test per converted C++ pass:
- test/lower-to-standard/loc_preservation.mlir
(--aie-standard-lowering preserves loc on llvm.aie2.acquire calls
and on memref.global)
- test/create-flows/loc_preservation.mlir
(--aie-create-pathfinder-flows preserves loc on aie.connect inside
synthesized switchboxes and on aie.wire between tiles)
- test/herd-routing/loc_preservation.mlir
(--aie-herd-routing preserves the herd's loc on synthesized
aie.connect ops; uses --verify-each=false because the legacy pass
produces IR the current placement model rejects, but location
forwarding is verifiable independently)
- test/create-cores/loc_preservation.mlir
(--aie-create-cores preserves the func.call loc on synthesized
aie.core / aie.buffer; --aie-lower-memcpy preserves the memcpy loc
on synthesized aie.flow / aie.dma_bd / aiex.useToken)
- test/Conversion/AIEToConfiguration/loc_preservation.mlir
(--convert-aie-to-control-packets preserves the device's loc on
synthesized aiex.control_packet ops)
All 431 lit tests across the converted-pass test directories plus the
three new python tests still pass (2 expectedly failed pre-existing,
8 unsupported).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 carries IRON Python source locations (already captured in Phase 1
and forwarded through MLIR passes in Phase 2) the rest of the way: into
the transaction binary artifact on disk and across the AIERT/aie-rt
round-trip that re-materializes aiex.npu.* ops from a recorded txn.
Path B — AIE → aie-rt → re-emitted aiex.npu.* ops:
AIERTControl gains a TxnLocBracket RAII helper that snapshots
XAie_TxnInst::NumCmds before and after each per-source-op block in
initLocks/initBuffers/configureLocksAndBd/pushToBdQueueAndEnable/
configureSwitches/addCoreEnable, attributing the produced cmd range
to the source op's getLoc(). UnknownLoc is skipped so unannotated ops
still inherit the device-loc fallback. AIEToConfiguration zips the
resulting per-cmd locs into TransactionBinaryOperation.sourceLoc and
emitTransactionOps/emitControlPacketOps use those locs (with the
device loc as fallback) instead of the previous unconditional
device.getLoc().
Path A — direct AIETranslateNpuToBinary → .bin sidecar:
AIETranslateNpuToBinary and AIETranslateControlPacketsToUI32Vec gain
an optional std::vector<TxnLocEntry> *locmap out-param. Each appendX
records [byte_offset, byte_size, opcode, source_op, address, loc] —
with regdb-resolved register name and module via a new
AIERegisterDatabase::lookupRegisterByOffset reverse index keyed by
(module, tile-relative offset) and decomposed from the absolute
address through AIETargetModel column/row shifts. emitNpuLocmapJSON
serializes the vector as JSON, recursing through NameLoc/FusedLoc/
CallSiteLoc so IRON's NameLoc(name, FileLineColLoc) round-trips
intact. aiecc grows a --keep-loc flag that emits <bin>.locmap.json
next to the .bin (.bin bytes unchanged so XRT consumers are
unaffected). New aie-translate mode aie-npu-to-binary-locmap exposes
the JSON for lit testing.
Tests:
- test/Targets/NPU/loc_sidecar.mlir verifies the JSON sidecar carries
IRON NameLoc names and file:line on aiex.npu.write32 / blockwrite.
- test/Conversion/AIEToConfiguration/loc_roundtrip.mlir verifies an
explicit loc on aie.lock propagates to the synthesized
aiex.npu.write32 across the AIERT round-trip.
- loc_preservation.mlir updated to assert Phase 3's stronger
contract: source-op loc, not device-loc fallback, on synthesized
control packets.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request significantly improves the preservation and propagation of source location information throughout the
mlir-aiecodebase. The main focus is to ensure that user source positions are captured at the Python frontend and maintained through all relevant C++ transformation passes, enabling better diagnostics, debugging, and profiling. The changes replace the use ofUnknownLocwith meaningful source locations, primarily by forwarding the location from source operations. This work is well documented in the newdocs/SourceLocations.mdand is supported by a proof-of-concept implementation.Key improvements include:
Documentation and Rationale
docs/SourceLocations.md, detailing the motivation, current state, infrastructure, proposed approach (in three phases), and proof-of-concept results for source location tracking inmlir-aie.Propagation of Source Locations in C++ Passes
builder.getUnknownLoc()with forwarding of source operation locations (op.getLoc()or equivalent) in multiple transformation passes, especially in:lib/Conversion/AIEToConfiguration/AIEToConfiguration.cpp[1] [2] [3] [4] [5]lib/Dialect/AIE/IR/AIEDialect.cpplib/Dialect/AIE/Transforms/AIECanonicalizeDevice.cpplib/Dialect/AIE/Transforms/AIECoreToStandard.cppfor all relevant lowering patterns, ensuring that new ops inherit the original op's location [1] [2] [3] [4] [5] [6] [7] [8] [9]Improved Location Handling Patterns
These changes collectively reduce the use of unknown locations by about 89% in the codebase, as documented, and lay the groundwork for further improvements in backend and tooling support for source locations.