Commit dfe5b7d
[Common][Pytorch] Add support for the FP8 Block Scaling (ie. Deepseek) recipe on Blackwell (#2157)
* Update to_string(NVTEScalingMode) to include block scaling
Signed-off-by: Jan Bielak <jbielak@nvidia.com>
* Add `nvte_swizzle_block_scaling_to_mxfp8_scaling_factors`
Signed-off-by: Jan Bielak <jbielak@nvidia.com>
* Convert FP8 block scaling tensors to MXFP8 tensors on Blackwell and newer in GEMM
Signed-off-by: Jan Bielak <jbielak@nvidia.com>
* Allow Blackwell and newer in Deepseek recipe compatbility check
Signed-off-by: Jan Bielak <jbielak@nvidia.com>
* Allow data_rows % 4 != 0 in 1d kernel
Signed-off-by: Jan Bielak <jbielak@nvidia.com>
* Load scaling factors in unswizzled order in 1d kernel
Signed-off-by: Jan Bielak <jbielak@nvidia.com>
* Enforce use of power of two scaling
Signed-off-by: Jan Bielak <jbielak@nvidia.com>
* Skip the FP8 block scaling exact GEMM test on Blackwell
Signed-off-by: Jan Bielak <jbielak@nvidia.com>
* Skip further tests with pow_2_scales=False
Signed-off-by: Jan Bielak <jbielak@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Initial implementation of tensor conversion for grouped gemm
Signed-off-by: Jan Bielak <jbielak@nvidia.com>
* Skip non power of two scaling cpp unit tests
Signed-off-by: Jan Bielak <jbielak@nvidia.com>
* Fix handling of all gather
Signed-off-by: Jan Bielak <jbielak@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Jan Bielak <jbielak@nvidia.com>
* Use compute capability 10.0 for logic with Blackwell
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
* Apply suggestions from code review
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
---------
Signed-off-by: Jan Bielak <jbielak@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>1 parent b840898 commit dfe5b7d
14 files changed
Lines changed: 553 additions & 35 deletions
File tree
- tests
- cpp/operator
- pytorch
- transformer_engine
- common
- include/transformer_engine
- swizzle
- transpose
- pytorch
- csrc
- extensions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
501 | 501 | | |
502 | 502 | | |
503 | 503 | | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
504 | 510 | | |
505 | 511 | | |
506 | 512 | | |
| |||
552 | 558 | | |
553 | 559 | | |
554 | 560 | | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
555 | 567 | | |
556 | 568 | | |
557 | 569 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
| |||
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
22 | | - | |
| 23 | + | |
| 24 | + | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
| |||
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| 36 | + | |
35 | 37 | | |
36 | 38 | | |
37 | 39 | | |
| |||
218 | 220 | | |
219 | 221 | | |
220 | 222 | | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
221 | 229 | | |
222 | 230 | | |
223 | 231 | | |
| |||
409 | 417 | | |
410 | 418 | | |
411 | 419 | | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
412 | 426 | | |
413 | 427 | | |
414 | 428 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
127 | 127 | | |
128 | 128 | | |
129 | 129 | | |
| 130 | + | |
130 | 131 | | |
131 | 132 | | |
132 | 133 | | |
| |||
Lines changed: 20 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
47 | 67 | | |
48 | 68 | | |
49 | 69 | | |
| |||
0 commit comments