Skip to content

Detailed VMI vs MI design comparison, using Quant&RoPE demo walk-through#492

Merged
Zhendong404 merged 6 commits into
mouliangyu:vmi-examplesfrom
learning-chip:vmi-annotate
Jul 3, 2026
Merged

Detailed VMI vs MI design comparison, using Quant&RoPE demo walk-through#492
Zhendong404 merged 6 commits into
mouliangyu:vmi-examplesfrom
learning-chip:vmi-annotate

Conversation

@learning-chip

@learning-chip learning-chip commented Jul 2, 2026

Copy link
Copy Markdown

Some algorithm background:

Note on VF performance: With current SIMD VF + optimized GM-level pipelining (tested internally, excluded here), the whole kernel can reach >3.2 TB/s BW on 950DT, thus quite memory-bound, not vector bound (i.e. the VF part is fast). In current demo PR the GM<->UB part is kept naive (single-core, no double buffer pipeline). We will polish and release end-to-end kernel as next step.

Note on syntax/styles: The biggest benefit of vmi is on type casting (abstracts away manual odd-even widening/interleaving/packing). ASC load_store_utils.h tries to abstract similar procedures, but not as elegant.

jiawei_zhuang and others added 6 commits July 2, 2026 17:05
Add reader contracts, concrete tile shapes, lowering-shape examples, bug callouts, and review checklists so the RoPE and MX quant walkthroughs better connect VMI source to MI/CCE behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
@learning-chip learning-chip marked this pull request as ready for review July 2, 2026 17:03
@learning-chip learning-chip changed the title A more detailed VMI vs MI design comparison, and quant/repo demo walk-through Detailed VMI vs MI design comparison, using quant/repo demo walk-through Jul 2, 2026
@learning-chip learning-chip changed the title Detailed VMI vs MI design comparison, using quant/repo demo walk-through Detailed VMI vs MI design comparison, using Quant&RoPE demo walk-through Jul 2, 2026
@Zhendong404 Zhendong404 merged commit 17cb056 into mouliangyu:vmi-examples Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants