-
Notifications
You must be signed in to change notification settings - Fork 605
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Revert adding pytorch-triton as a build requirement
2.12.0
#2592
opened Jan 13, 2026 by
tdophung
Loading…
5 of 13 tasks
(Bug fix) Fix accuracy issue for blockwise scaling+E8 scale on Blackwell
#2589
opened Jan 13, 2026 by
lhb8125
Loading…
13 tasks
[Common] MXFP8 kernel for grouped tensors
#2586
opened Jan 12, 2026 by
Oleg-Goncharov
•
Draft
13 tasks
[Common] Enable determinism for cuDNN >= 9.18 on Blackwell
2.12.0
#2584
opened Jan 12, 2026 by
cyanguwa
Loading…
8 of 13 tasks
Make router_fusion to adapt for the large num_of_expert(>2048)
#2582
opened Jan 9, 2026 by
Autumn1998
Loading…
13 tasks
fix(build): Handle namespace packages for PyPI CUDA detection
#2580
opened Jan 9, 2026 by
sbhavani
Loading…
6 of 13 tasks
fix(examples): te_llama compatibility with transformers >= 4.57
#2572
opened Jan 7, 2026 by
sbhavani
Loading…
6 of 13 tasks
[PyT] Update THD sink attention logic for cudnn >=9.18.0
2.12.0
#2568
opened Jan 6, 2026 by
cuichenx
Loading…
13 tasks
[NVFP4][Dense/MoE] Integrate Cutlass NVFP4 Row-Cast-Col-RHT-Transpose-Cast Fusion Kernel
fp4
MoE
#2555
opened Jan 3, 2026 by
zhongbozhu
Loading…
3 of 16 tasks
[Pytorch] Enhance bf16 precision optimizer performance with memory buffer
#2551
opened Dec 31, 2025 by
Baidu-AIAK
Loading…
[PyTorch] Remove unnecessary save of weights
#2549
opened Dec 30, 2025 by
pggPL
Loading…
8 of 13 tasks
[PyTorch]Add Casting-Free FP8-Flow-MoE Blockwise Optimizations
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2544
opened Dec 26, 2025 by
xiaoxi-wangfj
Loading…
4 of 13 tasks
[PyT] Plumbing correct bias dims from TE to cudnn
attention
bug
Something isn't working
pytorch
#2537
opened Dec 20, 2025 by
KshitijLakhani
Loading…
5 of 11 tasks
[DO NOT MERGE] Get seqlens and offsets in O(N) space instead of O(N*N) space
do not merge
#2530
opened Dec 17, 2025 by
KshitijLakhani
•
Draft
13 tasks
[JAX] Calculate seqlens and offsets in O(N) space instead of O(N*N) space for THD sequences
attention
#2522
opened Dec 16, 2025 by
KshitijLakhani
•
Draft
13 tasks
Documentation for cpu offloading
documentation
Improvements or additions to documentation
#2520
opened Dec 16, 2025 by
pggPL
Loading…
8 of 13 tasks
[DO NOT MERGE] Testing v2.6 + pr2201
attention
do not merge
#2513
opened Dec 12, 2025 by
KshitijLakhani
•
Draft
13 tasks
[common] Add support for cuBLASLt GEMM for GroupedTensor
MoE
#2502
opened Dec 10, 2025 by
pggPL
Loading…
8 tasks done
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.