[harness] Device-specific kernels for Triton and Helion#57
Merged
Conversation
Adds device-specific subdirectories to Triton and Helion backends. The deeper nesting allows for separate kernel implementations that may incode target-specific optimizations. For maintenance simplicity and readability, the split is chosen over maintaining multiple kernel implementations in a single file. The separate files act as entry points for the runner. In the future, truly universal kernels can be stored in a separate location and backend file structure might offer only simple redirection. While these backends support running the same kernel on different devices, encoding target-specific details can improve performance. The baseline PyTorch backend still relies on a single implementation thanks to its higher abstraction. Future backends should pick the most suitable structure for their needs.
sandlbn
approved these changes
Mar 5, 2026
Collaborator
sandlbn
left a comment
There was a problem hiding this comment.
LGTM overall. One thing worth a quick sanity check: benchmark_compare.py uses kernel paths directly via runner.kernels / level / f"{kernel_name}.py" — since runner.kernels is now set correctly in KernelBenchRunner.init with the device-type subdirectory, this should work fine, but worth confirming end-to-end for Triton/Helion on CUDA before merging.
Collaborator
Author
|
Just double checked, compare run as: But I definitely need to add tests to cover bench compare module. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds device-specific subdirectories to Triton and Helion backends.
Refactors CI benchmark options to use 'device x backend' grid.
The deeper nesting allows for separate kernel implementations that may encode target-specific optimizations.
For maintenance simplicity and readability, the split is chosen over maintaining multiple kernel implementations in a single file.
The separate files act as entry points for the runner. In the future, truly universal kernels can be stored in a separate location and backend file structure might offer only simple redirection.
While these backends support running the same kernel on different devices, encoding target-specific details can improve performance. The baseline PyTorch backend still relies on a single implementation thanks to its higher abstraction.
Future backends should pick the most suitable structure for their needs.