Skip to content

fix error in tload template for cube kernel#879

Open
Likai-19 wants to merge 1 commit into
hw-native-sys:mainfrom
Likai-19:bugfix_tload_cube_kernel
Open

fix error in tload template for cube kernel#879
Likai-19 wants to merge 1 commit into
hw-native-sys:mainfrom
Likai-19:bugfix_tload_cube_kernel

Conversation

@Likai-19

Copy link
Copy Markdown

修改 lib/TileOps/tload_template.py,分两部分:

1. 消歧逻辑改为 layout 优先

原来通过 rank-5 shape 做消歧(src.shape[4] == dst.valid_shape[1]),改为通过源 view 的 layout 属性(src.config.layout)。constraint 函数检查 layout 是否匹配预期(ND2NZ 只接受 ND/NZ 源,DN2NZ 只接受 DN/NZ 源,DN2ZN 只接受 DN/NZ 源)。当 layout 未知时保守返回 True(不消歧,保持旧行为)。

2. 新增 tload_gm_to_mat_dn2zn 模板

约束条件:

  • src.layoutDNNZ
  • dst.blayout=ROW_MAJOR, dst.slayout=COL_MAJOR

模板实现参照 TLoad.hpp:472TLoadCubeDN2ZN

n_value = g4                       # gShape4(最内层维度)
d_value = m                        # validRow(tile 行数)
src_inner_stride = s4 * elem_bytes # GetByteSize(gStride4)
pto.mte_gm_l1_frac(..., pto.FractalMode.ND2NZ,  # 复用 ND2NZ 硬件指令
    shape=(n_value, d_value),
    src_layout=(src_inner_stride,),
    dst_group=(1, 1, k, 0),
    ctrl=(0, False))

@Likai-19

Copy link
Copy Markdown
Author

Fixes #877

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the tload_template.py file by refactoring the nd2nz and dn2nz constraints to use the source view layout attribute for disambiguation, and introduces a new dn2zn constraint and template (template_tload_gm_to_mat_dn2zn) to support transposed DN to ZN fractal loads. The code review feedback highlights two critical issues: first, parsing MLIR custom attributes (e.g., #pto.layout<nd>) directly as strings will fail the layout checks, so the layout name needs to be extracted from the angle brackets; second, the stride passed to pto.mte_gm_l1_frac in the new template should be in elements rather than bytes, as multiplying by elem_bytes leads to an incorrect, duplicated multiplication.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +360 to 367
if hasattr(src, 'config') and src.config is not None:
view_config = src.config
if hasattr(view_config, 'layout'):
layout = view_config.layout
if layout is not None:
layout_val = layout.value if hasattr(layout, 'value') else str(layout)
if str(layout_val).upper() not in {'ND', 'NZ'}:
return False

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

If the layout attribute is an MLIR custom attribute (e.g., #pto.layout<nd>), its string representation will contain the full assembly format. Checking str(layout_val).upper() not in {'ND', 'NZ'} directly will fail because "#PTO.LAYOUT<ND>" is not in the set, causing the constraint to always return False and reject valid layouts. We should extract the layout name from the angle brackets if present to ensure robustness.

Suggested change
if hasattr(src, 'config') and src.config is not None:
view_config = src.config
if hasattr(view_config, 'layout'):
layout = view_config.layout
if layout is not None:
layout_val = layout.value if hasattr(layout, 'value') else str(layout)
if str(layout_val).upper() not in {'ND', 'NZ'}:
return False
if hasattr(src, 'config') and src.config is not None:
view_config = src.config
if hasattr(view_config, 'layout'):
layout = view_config.layout
if layout is not None:
layout_val = layout.value if hasattr(layout, 'value') else str(layout)
layout_str = str(layout_val).upper()
if '<' in layout_str:
layout_str = layout_str.split('<')[-1].strip('>')
if layout_str not in {'ND', 'NZ'}:
return False

Comment on lines +390 to 397
if hasattr(src, 'config') and src.config is not None:
view_config = src.config
if hasattr(view_config, 'layout'):
layout = view_config.layout
if layout is not None:
layout_val = layout.value if hasattr(layout, 'value') else str(layout)
if str(layout_val).upper() not in {'DN', 'NZ'}:
return False

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

If the layout attribute is an MLIR custom attribute (e.g., #pto.layout<dn>), its string representation will contain the full assembly format. Checking str(layout_val).upper() not in {'DN', 'NZ'} directly will fail because "#PTO.LAYOUT<DN>" is not in the set, causing the constraint to always return False and reject valid layouts. We should extract the layout name from the angle brackets if present to ensure robustness.

Suggested change
if hasattr(src, 'config') and src.config is not None:
view_config = src.config
if hasattr(view_config, 'layout'):
layout = view_config.layout
if layout is not None:
layout_val = layout.value if hasattr(layout, 'value') else str(layout)
if str(layout_val).upper() not in {'DN', 'NZ'}:
return False
if hasattr(src, 'config') and src.config is not None:
view_config = src.config
if hasattr(view_config, 'layout'):
layout = view_config.layout
if layout is not None:
layout_val = layout.value if hasattr(layout, 'value') else str(layout)
layout_str = str(layout_val).upper()
if '<' in layout_str:
layout_str = layout_str.split('<')[-1].strip('>')
if layout_str not in {'DN', 'NZ'}:
return False

Comment on lines +533 to +540
if hasattr(src, 'config') and src.config is not None:
view_config = src.config
if hasattr(view_config, 'layout'):
layout = view_config.layout
if layout is not None:
layout_val = layout.value if hasattr(layout, 'value') else str(layout)
if str(layout_val).upper() not in {'DN', 'NZ'}:
return False

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

If the layout attribute is an MLIR custom attribute (e.g., #pto.layout<dn>), its string representation will contain the full assembly format. Checking str(layout_val).upper() not in {'DN', 'NZ'} directly will fail because "#PTO.LAYOUT<DN>" is not in the set, causing the constraint to always return False and reject valid layouts. We should extract the layout name from the angle brackets if present to ensure robustness.

Suggested change
if hasattr(src, 'config') and src.config is not None:
view_config = src.config
if hasattr(view_config, 'layout'):
layout = view_config.layout
if layout is not None:
layout_val = layout.value if hasattr(layout, 'value') else str(layout)
if str(layout_val).upper() not in {'DN', 'NZ'}:
return False
if hasattr(src, 'config') and src.config is not None:
view_config = src.config
if hasattr(view_config, 'layout'):
layout = view_config.layout
if layout is not None:
layout_val = layout.value if hasattr(layout, 'value') else str(layout)
layout_str = str(layout_val).upper()
if '<' in layout_str:
layout_str = layout_str.split('<')[-1].strip('>')
if layout_str not in {'DN', 'NZ'}:
return False

Comment on lines +572 to +579
elem_bytes = pto.bytewidth(dst.element_type)

# TLoadCubeDN2ZN: nValue = gShape4, dValue = validRow
n_value = g4
d_value = m

# TLoadCubeDN2ZN: loop1SrcStride = GetByteSize(gStride4)
src_inner_stride = s4 * elem_bytes

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

In the TileLang DSL, pto.mte_gm_l1_frac expects the src_layout stride to be specified in terms of elements, not bytes. The DSL compiler internally handles the multiplication by elem_bytes based on the data type. Multiplying s4 by elem_bytes here results in a duplicate multiplication, causing the stride to be incorrect (too large) and leading to memory corruption or incorrect data loading. We should pass s4 directly.

Suggested change
elem_bytes = pto.bytewidth(dst.element_type)
# TLoadCubeDN2ZN: nValue = gShape4, dValue = validRow
n_value = g4
d_value = m
# TLoadCubeDN2ZN: loop1SrcStride = GetByteSize(gStride4)
src_inner_stride = s4 * elem_bytes
# TLoadCubeDN2ZN: nValue = gShape4, dValue = validRow
n_value = g4
d_value = m
# TLoadCubeDN2ZN: loop1SrcStride = gStride4
src_inner_stride = s4

@Likai-19 Likai-19 force-pushed the bugfix_tload_cube_kernel branch from 1fedd23 to c2fb532 Compare June 29, 2026 08:08
@reedhecre

reedhecre commented Jun 29, 2026

Copy link
Copy Markdown

Codex Review

该评论由 review 机器人自动更新。

  • PR: fix error in tload template for cube kernel #879 fix error in tload template for cube kernel
  • Author: Likai-19
  • Base/Head: main / bugfix_tload_cube_kernel
  • Head SHA: c2fb532d78ed
  • Trigger: 检测到新的 open PR
  • Generated At: 2026-06-29T14:05:01Z
  • Status: completed

Summary

Layout-based TLOAD.MAT matching in PR #879 regresses existing DN2NZ selection and also breaks MAT-NZ template resolution when view layout inference is unavailable.

Findings

  1. P1 Existing DN2NZ loads are rematched to ND2NZ under inferred layouts lib/TileOps/tload_template.py:386

_constraint_tload_mat_dn2nz now disambiguates only from src.config.layout. That breaks the repo's existing DN2NZ call pattern, which uses transposed 2-D views like shape=[32,16], strides=[16,1] without an explicit layout=#pto.layout<dn> (see test/lit/pto/cube_tile_ops_positive.pto and test/tilelang_st/npu/a5/src/st/testcase/tload_mat/tload_mat.pto). InferPTOLayout classifies that shape/stride pair as ND, so after this change _constraint_tload_mat_dn2nz rejects the case while _constraint_tload_mat_nd2nz accepts it. The expanded kernel will therefore emit ND2NZ for an existing DN2NZ input, changing source addressing from logical src[d,n] to src[n,d] and loading the wrong matrix contents.

  1. P2 MAT-NZ tload becomes ambiguous when layout inference is disabled lib/TileOps/tload_template.py:357

Both MAT-NZ constraints now intentionally return True when the source view has no config.layout. Because the ND2NZ and DN2NZ kernels still have identical priorities and dtype signatures, any build that uses the supported --disable-infer-layout path can no longer disambiguate pto.tload into MAT/NZ: select_kernel() will see both templates as highest-priority matches and fail. The previous shape-based checks did not depend on layout inference, so this is a real compatibility regression for users who compile without inferred layouts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants