Skip to content

Better location information#20

Open
fifield wants to merge 10 commits intomainfrom
location
Open

Better location information#20
fifield wants to merge 10 commits intomainfrom
location

Conversation

@fifield
Copy link
Copy Markdown
Owner

@fifield fifield commented May 6, 2026

This pull request significantly improves the preservation and propagation of source location information throughout the mlir-aie codebase. The main focus is to ensure that user source positions are captured at the Python frontend and maintained through all relevant C++ transformation passes, enabling better diagnostics, debugging, and profiling. The changes replace the use of UnknownLoc with meaningful source locations, primarily by forwarding the location from source operations. This work is well documented in the new docs/SourceLocations.md and is supported by a proof-of-concept implementation.

Key improvements include:

Documentation and Rationale

  • Added a comprehensive document, docs/SourceLocations.md, detailing the motivation, current state, infrastructure, proposed approach (in three phases), and proof-of-concept results for source location tracking in mlir-aie.

Propagation of Source Locations in C++ Passes

  • Replaced builder.getUnknownLoc() with forwarding of source operation locations (op.getLoc() or equivalent) in multiple transformation passes, especially in:
    • lib/Conversion/AIEToConfiguration/AIEToConfiguration.cpp [1] [2] [3] [4] [5]
    • lib/Dialect/AIE/IR/AIEDialect.cpp
    • lib/Dialect/AIE/Transforms/AIECanonicalizeDevice.cpp
    • lib/Dialect/AIE/Transforms/AIECoreToStandard.cpp for all relevant lowering patterns, ensuring that new ops inherit the original op's location [1] [2] [3] [4] [5] [6] [7] [8] [9]

Improved Location Handling Patterns

  • Ensured that synthesized operations (e.g., buffers, locks, function calls, global memory references) in lowering and transformation passes now inherit and correctly propagate the source location, rather than defaulting to an unknown location. This is critical for meaningful error messages and debugging.

These changes collectively reduce the use of unknown locations by about 89% in the codebase, as documented, and lay the groundwork for further improvements in backend and tooling support for source locations.

fifield and others added 10 commits May 4, 2026 16:23
Investigates carrying MLIR Location info through mlir-aie. Today every
IRON-built op gets Location.unknown() and ~136 builder.getUnknownLoc()
sites in the C++ passes drop locations as ops are synthesized. This
patch demonstrates an end-to-end path from a user's Python program
down through one transform pass.

- python/iron/_loc.py: capture_user_loc(name=...) walks Python frames
  past IRON / mlir-aie / venv internals and returns
  NameLoc(name, FileLineColLoc(...)). Returns None when no MLIR Context
  is active so callers fall back to Location.unknown().
- python/iron/{dataflow/objectfifo,worker,kernel,runtime/runtime,
  runtime/dmatask}.py: capture _user_loc at IRON construction (or at
  the Runtime.fill / drain call site) and apply it via
  `with loc_or_unknown(self._user_loc):` around MLIR op creation in
  resolve().
- lib/Dialect/AIE/Transforms/AIEObjectFifoStatefulTransform.cpp:
  forward op.getLoc() instead of builder.getUnknownLoc() in
  createObjectFifo, createObjectFifoLocks, and the BufferOp creation
  inside createObjectFifoElements -- so locks and buffers materialized
  from an objectfifo inherit the user's source position.
- docs/SourceLocations.md: investigation summary, current state,
  three-phase plan, and effort estimate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to commit efca7bd.

Two changes that make the prototype actually work end-to-end and stay
working:

1. Eager frame capture in `_loc.py`. The previous version called
   `Location.file()` from inside `__init__`, but IRON construction runs
   *before* `Program.resolve_program()` enters `mlir_mod_ctx()`, so
   there was no MLIR `Context` and `capture_user_loc()` returned None
   for every op. Now it captures `(filename, line, col)` into a small
   `CapturedLoc` dataclass at construction time and materializes the
   real `ir.Location` lazily inside the `with` block in `resolve()`.

2. A small fix to the C++ pass change: the previous `Location loc = {}`
   default with `loc ? loc : builder.getUnknownLoc()` doesn't compile
   because `mlir::Location` has no default constructor or `operator
   bool()`. Made the loc parameter required and updated the one caller
   (the FIFO-split site).

Tests added (all passing):

- `test/python/loc_capture.py` — unit test for `capture_user_loc` /
  `CapturedLoc.materialize` / `loc_or_unknown`.
- `test/python/iron_loc_emit.py` — builds a small IRON program and
  asserts that ObjectFifo / Worker / Kernel / Runtime.fill / drain ops
  carry `loc("of_in"(this_file:line))` etc. through the emitted module.
- `test/objectFifo-stateful-transform/base/loc_preservation.mlir` —
  hand-written MLIR with an explicit `loc(#user_loc)` on the
  objectfifo.create; pipes it through
  `aie-opt --aie-objectFifo-stateful-transform --mlir-print-debuginfo`
  and FileChecks that the synthesized lock / buffer ops reference the
  same source location alias.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… sites

Removes the rest of the `builder.getUnknownLoc()` calls from this pass
(38 → 0). Every op the stateful-transform synthesizes from an
`aie.objectfifo` now inherits a useful location:

- locks, buffers, the new objectfifo from a split — already done in
  the prior commit, kept.
- `createBd` widened to take a `Location` (so the DMA body — `use_lock`,
  `dma_bd`, `dma_bd_packet`, `next_bd` — all share the source op's loc).
- mem / shim_dma / mem_tile_dma terminator + `dma_start` ops use the
  ObjectFifoCreateOp's loc.
- packet flow / packet source / packet dest / flow / shim_dma_allocation
  use producer's or consumer's loc, whichever is contextually correct.
- dynamic-objectfifo helper uses the surrounding core / acquire /
  release / access op's loc when synthesizing `arith.constant`,
  `memref.load`, `memref.store`, `arith.index_cast`, `scf.yield`,
  the global next-index buffer, etc.
- `findOrCreateTile` uses the host tile's loc for the new TileOp.

All 131 existing stateful-transform lit tests still pass (1 expectedly
failed, pre-existing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Continues threading source locations through the back half of the pass
pipeline.

AIECoreToStandard.cpp (13 → 1 getUnknownLoc): the lowering patterns
that emit `func.call` for `aie.put_stream` / `aie.get_stream` /
`aie.put_cascade` / `aie.get_cascade` / `aie.use_lock` / `aie.debug`,
the `memref.global` / `memref.get_global` / `memref.assume_alignment`
trio for `aie.buffer`, and the `func.func` / `func.return` / event
trace calls all now use the source op's location. The one remaining
`builder.getUnknownLoc()` is the `declareAIEIntrinsics` site, which
emits LLVM-intrinsic `func.func` declarations that don't correspond to
a user-written op.

AIEToConfiguration.cpp (5 → 0): `emitTransactionOps`,
`emitControlPacketOps`, and `convertTransactionOpsToMLIR` now thread a
`Location` parameter through (sourced from the parent `aie.device`),
so the synthesized `npu.write32` / `npu.maskwrite32` / `npu.blockwrite`
/ `npu.control_packet` ops at the runtime-config layer carry the
device's location instead of `unknown`. The `aiex.runtime_sequence`
created by `convertAIEToConfiguration` when none exists also gets the
device's location.

Together with the previous commit, this leaves only ~75 of the
original 136 `getUnknownLoc()` sites in the codebase — the remaining
ones are in passes outside the IRON → ObjFifo → core/runtime path
(e.g. AIEVecToLLVM, AIECreatePathFindFlows, AIEHerdRouting).

All 227 existing tests under test/Conversion and
test/objectFifo-stateful-transform still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…extend IRON test

AIEHerdRouting.cpp (6 → 0): use the source HerdOp's location for the
synthesized IterOp / SelectOp / SwitchboxOp / ConnectOp when expanding
herd routes.

AIECreatePathFindFlows.cpp (19 → 0): in `addConnection` use
`flowOp.getLoc()` (the original FlowOp being lowered); in
`runOnPacketFlow` use the per-tile `tile.getLoc()` for the SwitchboxOp /
AMSelOp / MasterSetOp / PacketRulesOp / PacketRuleOp it generates per
tile, and `tileOp.getLoc()` for the shimmux ConnectOps; in the
wire-creation loop at the bottom of `runOnOperation`, use the loop's
`tile.getLoc()` for every WireOp.

Extends `test/python/iron_loc_emit.py` with a second RUN line that
pipes the IRON-emitted module through
`aie-opt --aie-place-tiles --aie-objectFifo-stateful-transform
       --mlir-print-debuginfo` and FileChecks that the synthesized
buffers / locks / dma_bd ops still reference the same NameLoc alias
(`loc("of_in"(...))` / `loc("of_out"(...))`) tied back to the user's
Python source. This is the first regression-tested demonstration of
the full Python -> dialect -> pass chain preserving locations.

All 255 lit tests under Conversion/, objectFifo-stateful-transform/,
find-flows/, create-flows/, plus the new python/loc_capture.py and
python/iron_loc_emit.py still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… sites

Records the current state on the `location` branch:
- Phase 1 (Python capture): fully implemented, 3 passing lit tests.
- Phase 2 (pass plumbing): IRON path complete (5 passes converted,
  ~80 of 136 sites done), other passes pending.
- Phase 3 (backend / DWARF): unstarted but unblocked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 sweep continues, completing the audit of passes outside the
IRON path. Each site replaced uses the source op's location (the op
being lowered, the originating broadcast/multicast/cascade/control
op, or a contextually-relevant tile / device op).

Files updated:
- AIE/Transforms/AIEObjectFifoRegisterProcess.cpp (10 → 0): for-loop
  bounds, acquires/releases/subview accesses, kernel CallOp, and the
  CoreOp/EndOp synthesized for tiles missing one — all now use the
  source ObjectFifoRegisterProcessOp's loc.
- AIEX/Transforms/AIECreateCores.cpp (9 → 0): TileOp / BufferOp /
  MemOp / EndOp / CoreOp / arith.constant / memref.load synthesized
  for each callOp now use callOp.getLoc().
- AIEX/Transforms/AIELowerMemcpy.cpp (6 → 0): widened
  createDMABlocksAndOps to take a Location; all dma_start / use_token
  / dma_bd / next_bd / FlowOp now use the MemcpyOp's loc.
- AIE/Transforms/AIEGenerateColumnControlOverlay.cpp (5 → 0): widened
  createPacketFlowOp to take a Location; packet_flow / packet_source /
  packet_dest / aie.end and shim_dma_allocation use the control tile's
  loc.
- AIEX/Transforms/AIECreateBroadcastPacket.cpp (4 → 0): packet_flow /
  packet_source / aie.end use the source BroadcastPacketOp's loc;
  packet_dest uses the BPDestOp's loc.
- AIE/IR/AIEDialect.cpp (1 → 0): TileOp synthesized in the dialect's
  helper uses the device's loc.
- AIE/Transforms/AIECanonicalizeDevice.cpp (1 → 0): synthesized
  DeviceOp uses the module's loc.
- AIE/Transforms/AIELocalizeLocks.cpp (1 → 0): localized lock-id
  constant uses the source LockOp's loc.
- AIE/Transforms/AIELowerCascadeFlows.cpp (1 → 0): ConfigureCascadeOp
  uses the tile's loc.
- AIEX/Transforms/AIECreateLocks.cpp (1 → 0): synthesized LockOp uses
  the tile's loc.
- AIEX/Transforms/AIECtrlPacketToDma.cpp (1 → 0): NpuDmaMemcpyNdOp
  uses the device's loc (already in scope as `loc`).
- AIEX/Transforms/AIEExpandLoadPdi.cpp (1 → 0): the synthesized
  empty DeviceOp/EndOp/NpuLoadPdiOp now use loadPdiOp's loc.
- AIEX/Transforms/AIELowerMulticast.cpp (1 → 0): synthesized FlowOp
  uses the MultiDestOp's loc.

Intentionally left:
- AIEVecToLLVM.cpp (8): all sites are LLVM-intrinsic FuncOp
  *declarations* and helper-wrapper bodies — same category as
  declareAIEIntrinsics in AIECoreToStandard. They don't correspond to
  user code; the actual op lowering preserves location through the
  standard rewriter pattern.
- AIEPathFinder.cpp (5): DynamicTileAnalysis helpers that synthesize
  Tile/Switchbox/ShimMux infrastructure on demand; the analyzer has
  no direct access to a source op.
- AIECoreToStandard.cpp (1): declareAIEIntrinsics, intentionally left
  in the prior commit.
- AIEToConfiguration.cpp (1): convertTransactionBinaryToMLIR creates
  a brand-new ModuleOp from a parsed binary blob — there is no
  source MLIR op to point to.

Tally: ~136 → 15 sites total (89% reduction). The 15 remaining are
infrastructure synthesis sites where UnknownLoc is appropriate.

All 404 lit tests across Conversion/, find-flows/, create-flows/,
create-locks/, canonicalize-device/, dialect/, CppTests/,
objectFifo-stateful-transform/, and the new python tests still pass
(2 expectedly failed pre-existing, 3 unsupported).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Records the final tally: 136 → 15 unknown-loc sites (89% reduction).
The 15 remaining are all "no source op" cases (intrinsic declarations,
analyzer-synthesized infrastructure, binary-disassembly module
creation) where UnknownLoc is appropriate.

Phase 3 (backend / DWARF / aiecc tooling) is unstarted but unblocked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three medium-severity items raised in review:

1. CapturedLoc.__enter__ raised FrozenInstanceError (python/iron/_loc.py).
   Switched to object.__setattr__ for the transient _active slot so the
   helper actually works as a `with captured:` context manager. Added a
   CHECK-6 case to test/python/loc_capture.py exercising the direct form.

2. Frontend capture coverage was incomplete. Added eager capture +
   `with loc_or_unknown(...)` in resolve() for the remaining IRON entry
   points the previous commits missed:
   - Buffer.__init__ / Buffer.resolve (python/iron/buffer.py)
   - ObjectFifoLink.__init__ / ObjectFifoLink.resolve, with a synthesized
     name `link_<srcs>_to_<dsts>` so split / join / forward call sites
     show up clearly in MLIR (python/iron/dataflow/objectfifo.py)
   - InlineOpRuntimeTask now stores user_loc; Runtime.inline_ops captures
     it (python/iron/runtime/{task.py,runtime.py})
   - _BarrierSetOp now stores user_loc; Runtime.set_barrier captures it
     (python/iron/{worker.py,runtime/runtime.py})
   New test test/python/iron_loc_emit_extras.py builds a program using
   Buffer, ObjectFifoLink (via .forward()), Runtime.set_barrier and
   Runtime.inline_ops, and FileChecks that NameLocs for `my_scratch` and
   `link_of_in_to_of_mid` reach the emitted MLIR.

3. Per-pass test coverage was thin. Added one focused location-
   preservation lit test per converted C++ pass:
   - test/lower-to-standard/loc_preservation.mlir
     (--aie-standard-lowering preserves loc on llvm.aie2.acquire calls
     and on memref.global)
   - test/create-flows/loc_preservation.mlir
     (--aie-create-pathfinder-flows preserves loc on aie.connect inside
     synthesized switchboxes and on aie.wire between tiles)
   - test/herd-routing/loc_preservation.mlir
     (--aie-herd-routing preserves the herd's loc on synthesized
     aie.connect ops; uses --verify-each=false because the legacy pass
     produces IR the current placement model rejects, but location
     forwarding is verifiable independently)
   - test/create-cores/loc_preservation.mlir
     (--aie-create-cores preserves the func.call loc on synthesized
     aie.core / aie.buffer; --aie-lower-memcpy preserves the memcpy loc
     on synthesized aie.flow / aie.dma_bd / aiex.useToken)
   - test/Conversion/AIEToConfiguration/loc_preservation.mlir
     (--convert-aie-to-control-packets preserves the device's loc on
     synthesized aiex.control_packet ops)

All 431 lit tests across the converted-pass test directories plus the
three new python tests still pass (2 expectedly failed pre-existing,
8 unsupported).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 carries IRON Python source locations (already captured in Phase 1
and forwarded through MLIR passes in Phase 2) the rest of the way: into
the transaction binary artifact on disk and across the AIERT/aie-rt
round-trip that re-materializes aiex.npu.* ops from a recorded txn.

Path B — AIE → aie-rt → re-emitted aiex.npu.* ops:
  AIERTControl gains a TxnLocBracket RAII helper that snapshots
  XAie_TxnInst::NumCmds before and after each per-source-op block in
  initLocks/initBuffers/configureLocksAndBd/pushToBdQueueAndEnable/
  configureSwitches/addCoreEnable, attributing the produced cmd range
  to the source op's getLoc(). UnknownLoc is skipped so unannotated ops
  still inherit the device-loc fallback. AIEToConfiguration zips the
  resulting per-cmd locs into TransactionBinaryOperation.sourceLoc and
  emitTransactionOps/emitControlPacketOps use those locs (with the
  device loc as fallback) instead of the previous unconditional
  device.getLoc().

Path A — direct AIETranslateNpuToBinary → .bin sidecar:
  AIETranslateNpuToBinary and AIETranslateControlPacketsToUI32Vec gain
  an optional std::vector<TxnLocEntry> *locmap out-param. Each appendX
  records [byte_offset, byte_size, opcode, source_op, address, loc] —
  with regdb-resolved register name and module via a new
  AIERegisterDatabase::lookupRegisterByOffset reverse index keyed by
  (module, tile-relative offset) and decomposed from the absolute
  address through AIETargetModel column/row shifts. emitNpuLocmapJSON
  serializes the vector as JSON, recursing through NameLoc/FusedLoc/
  CallSiteLoc so IRON's NameLoc(name, FileLineColLoc) round-trips
  intact. aiecc grows a --keep-loc flag that emits <bin>.locmap.json
  next to the .bin (.bin bytes unchanged so XRT consumers are
  unaffected). New aie-translate mode aie-npu-to-binary-locmap exposes
  the JSON for lit testing.

Tests:
  - test/Targets/NPU/loc_sidecar.mlir verifies the JSON sidecar carries
    IRON NameLoc names and file:line on aiex.npu.write32 / blockwrite.
  - test/Conversion/AIEToConfiguration/loc_roundtrip.mlir verifies an
    explicit loc on aie.lock propagates to the synthesized
    aiex.npu.write32 across the AIERT round-trip.
  - loc_preservation.mlir updated to assert Phase 3's stronger
    contract: source-op loc, not device-loc fallback, on synthesized
    control packets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant