Skip to content

test(lit/vmi): add SimdVF per-token FP8 cast target IR#487

Open
WenboCodes wants to merge 27 commits into
mouliangyu:feature-vmifrom
WenboCodes:simdvf-per-token-fp8-vmi-lit-test
Open

test(lit/vmi): add SimdVF per-token FP8 cast target IR#487
WenboCodes wants to merge 27 commits into
mouliangyu:feature-vmifrom
WenboCodes:simdvf-per-token-fp8-vmi-lit-test

Conversation

@WenboCodes

Copy link
Copy Markdown
Collaborator

Summary

Add a design-reference .pto to test/lit/vmi/ capturing the target pto.vmi lowering for the SimdVF per-token cast-to-FP8 kernel.

Contents (verbatim from PTO-Gym/docs/simtvf-per-token-cast-to-fp8-pto-vmi-mapping.md):

  • Header comment holds the original TileLang with T.SimdVF(): source (section 1 of the doc).
  • Body holds the forward-looking pseudo MLIR (section 3 / AFTER) of the VMI lowering.

Why this shape

The VMI code in the source doc is explicitly design-level pseudo MLIR. It uses surface ops that are not yet implemented in the PTOAS VMI dialect:

Pseudo op in this file Real VMI op today
pto.vmi.vlds / pto.vmi.vsts pto.vmi.load / pto.vmi.store
pto.vmi.pset_b32 "PAT_ALL" pto.vmi.create_mask
pto.vmi.vreduce_max {sub_group = 2} none — group reductions exist only for group_reduce_addf/group_reduce_addi; there is no group max-reduce
pto.vmi.vbr {to = 256, sub_group = 2} pto.vmi.group_broadcast {num_groups = N}
pto.vmi.vcvt {to = ..., rnd = "R", sat = "SAT"} pto.vmi.truncf
pto.vmi.vsts {mode = "ONE_PER_SUB_GROUP"} not yet implemented

So it cannot be parsed by ptoas/pto-test-opt. It is stored as a no-op (RUN: true) forward-looking spec rather than an executable regression test. See the upstream doc's section 8 ("需要 PTO/VMI surface 补齐的点") for the full list of missing surface points.

Test plan

  • lit test/lit/vmi/vmi_simdvf_per_token_cast_to_fp8.pto → passes as no-op (RUN: true).
  • No pto-test-opt execution, by design.

🤖 Generated with Claude Code

mouliangyu and others added 27 commits June 24, 2026 09:13
Add design-reference .pto capturing the pto.vmi lowering for the
SimdVF per-token cast-to-FP8 kernel, verbatim from
PTO-Gym/docs/simtvf-per-token-cast-to-fp8-pto-vmi-mapping.md.

The file holds the original TileLang SimdVF source in its header
comment and the forward-looking pseudo MLIR (vreduce_max{sub_group},
vbr{to,sub_group}, vcvt{to,rnd,sat}, pset_b32 "PAT_ALL", etc.) which
uses surface ops not yet implemented in the VMI dialect. It is a
no-op (RUN: true) spec, not an executable regression test.

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants