Skip to content

[Example] Add TileKernels Examples in Tilus#155

Open
WilliamZhang20 wants to merge 27 commits into
NVIDIA:mainfrom
WilliamZhang20:tilekernels-examples
Open

[Example] Add TileKernels Examples in Tilus#155
WilliamZhang20 wants to merge 27 commits into
NVIDIA:mainfrom
WilliamZhang20:tilekernels-examples

Conversation

@WilliamZhang20
Copy link
Copy Markdown
Contributor

@WilliamZhang20 WilliamZhang20 commented May 8, 2026

Kernel List:

  • swiglu_forward_and_per_token_cast_kernel
  • per_token_cast

Benchmarks on H200 NVL:

Per-token FP8 cast matches reference for size (257, 4096); max code diff=32; dequantized sum diff=1.21161
   tokens  hidden  tilekernels (ms)  tilus (ms) speedup  sum diff
0     128    1024          0.006592    0.005952   1.11x  0.079254
1     256    2048          0.007168    0.006336   1.13x  0.960876
2     257    4096          0.008096    0.007328   1.10x  1.211609
SwiGLU FP8 cast matches reference for size (1024, 4096); max code diff=2; dequantized sum diff=0.0253906
   tokens  hidden  tilekernels (ms)  tilus (ms) speedup  sum diff
0     128    1024          0.008768    0.006400   1.37x  0.000092
1     256    2048          0.008320    0.007264   1.15x  0.000061
2     257    4096          0.010688    0.008448   1.27x  0.000122
3    1024    4096          0.019648    0.014080   1.40x  0.025391

WilliamZhang20 and others added 14 commits April 4, 2026 03:26
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 8, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Copy link
Copy Markdown
Member

@yaoyaoding yaoyaoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WilliamZhang20 , LGTM in general.

Comment thread python/tilus/backends/emitter.py Outdated
@yaoyaoding
Copy link
Copy Markdown
Member

/ok to test e6c70ef

@WilliamZhang20 WilliamZhang20 changed the title Add TileKernels Examples in Tilus [Example] Add TileKernels Examples in Tilus May 10, 2026
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
Signed-off-by: William Zhang <wzhang20@yahoo.com>
@yaoyaoding
Copy link
Copy Markdown
Member

/ok to test 7638df8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants