Skip to content

[pull] main from NVIDIA:main#607

Merged
pull[bot] merged 2 commits into
phu0ngng:mainfrom
NVIDIA:main
May 15, 2026
Merged

[pull] main from NVIDIA:main#607
pull[bot] merged 2 commits into
phu0ngng:mainfrom
NVIDIA:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented May 15, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

tdophung and others added 2 commits May 14, 2026 16:22
* [JAX] Size autotuned Triton grids per config (3x perm-kernel speedup)

The autotuned path in triton_call_lowering compiled all BLOCK_SIZE configs
but dispatched every one with the same fixed grid sized for the smallest
BLOCK_SIZE, so larger configs over-launched by the BLOCK_SIZE ratio. Make
grid accept a callable(meta)->tuple evaluated per config, matching the
jax-triton API. Update _permute_kernel, _unpermute_kernel, and
_sort_chunks_by_map_kernel lowerings. Measured 22.6ms -> 7.4ms (3.06x) on
GB200 for sort_chunks at 524k tokens, hidden=4096, fp32.

* [JAX] Triton wrapper defaults match jax-triton (3.25ms speedup)

num_warps default 32->4 and num_stages 1->3 in triton_call_lowering match
Triton's own triton.Config defaults. Non-autotuned kernels (e.g.
_make_chunk_sort_map_kernel) were running with 1024 threads/block, an 8x
kernel slowdown. Also: tuple/callable grid assertion + comment trims.

Signed-off-by: tdophung <tdophung@nvidia.com>
The Lint workflow runs cpplint and pylint against the checked-out
tree. No cache, no GitHub API write. `permissions: contents: read`
captures that and matches the per-job permissions blocks already
used in deploy_nightly_docs.yml (pages:write + id-token:write) and
upload-ci-logs.yml (statuses:write).

build.yml is left out because it pulls mozilla-actions/sccache-action
(which writes to the Actions cache) and easimon/maximize-build-space.
A drive-by permissions block there would need actions:write for the
sccache save path, which deserves a separate look.

Signed-off-by: Arpit Jain <arpitjain099@gmail.com>
@pull pull Bot locked and limited conversation to collaborators May 15, 2026
@pull pull Bot added the ⤵️ pull label May 15, 2026
@pull pull Bot merged commit eca05d3 into phu0ngng:main May 15, 2026
@pull pull Bot had a problem deploying to github-pages May 15, 2026 04:33 Failure
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants