Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
427 commits
Select commit Hold shift + click to select a range
8a9cba6
[moe training] add fp8 rowwise kernels for expert weights (#2696)
danielvegamyhre Aug 7, 2025
143c3a6
[moe training] add bench script for fp8 rowwise kernels and update au…
danielvegamyhre Aug 7, 2025
246b142
[moe training] integrate rowwise expert quant kernel (#2698)
danielvegamyhre Aug 7, 2025
0fd0cae
When replacing literals with placeholders lists are always converted …
kimishpatel Aug 7, 2025
1526dfe
Update KleidiAI (#2692)
metascroy Aug 7, 2025
1114ca0
Check numerical equivalence / closeness between different kernel pref…
jerryzh168 Aug 7, 2025
bfe34b5
Add all fbgemm kernel Tensors into Int4WeightOnlyConfig and Float8Dyn…
jerryzh168 Aug 7, 2025
0315628
Updating 4xH100 to only run with tags or workflow dispatch (#2715)
jerryzh168 Aug 8, 2025
7cb920b
Don't call erase if node is already erased in batch norm fusion.
abeakkas Aug 8, 2025
c086ade
Remove dep on protype MoEQuantConfig (#2717)
jerryzh168 Aug 8, 2025
6cfa477
Generalize FakeQuantizer beyond intx (#2714)
andrewor14 Aug 8, 2025
4fc4068
Add api for group wise low bit quantization, using codebook utils pro…
szyszyzys Aug 9, 2025
ec9961c
Deprecate old TORCH_VERSION variables (#2719)
andrewor14 Aug 11, 2025
948ade1
Fix internal tests after recent chagnes
jerryzh168 Aug 11, 2025
853f87d
torchao.float8: update with AMD MI300X benchmark results (#2736)
vkuzo Aug 11, 2025
510e1b4
Add __init__.py for group wise lut quantization package
szyszyzys Aug 11, 2025
fe0ddf1
Allow pattern replacement to ignore literals (#2519)
kimishpatel Aug 12, 2025
5bf05b6
Add meta function for linear operation (groupwise lut kernel).
szyszyzys Aug 12, 2025
d7f7bf2
Remove meta linear operation in cpp.
szyszyzys Aug 12, 2025
c88ebe8
fix float8 training benchmarks on AMD (#2737)
vkuzo Aug 12, 2025
0b88286
Align Int4Tensor implementation details with the design of Float8Tens…
jerryzh168 Aug 12, 2025
7c13cde
Support `optional_tensor_names` in TorchAOBaseTensor (#2710)
jerryzh168 Aug 12, 2025
4fe5ec6
Update Int4PreshuffledTensor to align with implementation details of …
jerryzh168 Aug 12, 2025
d08bbb0
don't learn zero points for symmetric quantization (#2739)
liangel-02 Aug 12, 2025
cd7975e
Rename `Float8ActivationInt4WeightConfig` to `Float8DynamicActivation…
jerryzh168 Aug 12, 2025
1dca638
Remove calls to contiguous in the implementation of Float8Tensor (#2747)
jerryzh168 Aug 12, 2025
10a0bdd
Update quantization overview and contributor guide doc (#2723)
jerryzh168 Aug 12, 2025
aec9a79
Fix test after removing contiguous() (#2751)
jerryzh168 Aug 12, 2025
317179e
Remove double baseline calculations for CI microbenchmarks (#2613)
jainapurva Aug 12, 2025
3bad6a2
Drop support for PyTorch 2.5 and before (#2720)
andrewor14 Aug 13, 2025
e79208c
Remove old `change_linear_weights_to_*` APIs (#2721)
andrewor14 Aug 13, 2025
615877d
Replace `export_for_training` with `torch.export.export` (#2724)
andrewor14 Aug 13, 2025
f01c956
Allow no quantization during QATConfig convert (#2694)
andrewor14 Aug 13, 2025
46ba24c
Fix ruff after https://github.com/pytorch/ao/pull/2724 (#2759)
andrewor14 Aug 13, 2025
a1a9632
[ROCm] fix build for newer hipblaslt BC-breaking change (#2510)
jeffdaily Aug 13, 2025
715ea9f
Add float8 FakeQuantizeConfig and FakeQuantizer (#2735)
andrewor14 Aug 13, 2025
21ceb8e
Track API usage (#2706)
andrewor14 Aug 13, 2025
ea3691e
Update Int4WeightOnlyConfig VERSION argument (#2754)
jerryzh168 Aug 13, 2025
6794ef5
Reference representation of dqlinear int4 for xnnpack (#2520)
kimishpatel Aug 13, 2025
d86ae25
Allow per-group quantizers in QuantOptimizer, fix state_dict (#2743)
lisjin Aug 13, 2025
6a2d975
Update autoquant.py (#2766)
jerryzh168 Aug 14, 2025
c232c55
Fix missing QuantOptimizer methods (#2770)
lisjin Aug 14, 2025
2db4c76
Add CPP version of bitpacking.
szyszyzys Aug 14, 2025
927cbfb
Update float8 README.md (#2774)
vkuzo Aug 15, 2025
e43a220
[moe training] update tests for torchtitan moe refactor (#2733)
danielvegamyhre Aug 15, 2025
c1223e1
[moe training] use custom ops instead of wrap_triton for fp8 rowwise …
danielvegamyhre Aug 15, 2025
478c5f2
[moe training] fix scaling type bug; refactor distributed tests (#2749)
danielvegamyhre Aug 15, 2025
f600b83
[moe training] use llama4 shapes for kernel benchmarks (#2756)
danielvegamyhre Aug 15, 2025
3d00e8f
[moe training] remove duplicate benchmark script (#2762)
danielvegamyhre Aug 15, 2025
d38e9b6
[moe training] update bench script to compare fp8 dynamic quant scale…
danielvegamyhre Aug 15, 2025
aed4f84
[moe training] refactor to share benchmarking and profiling utils (#2…
danielvegamyhre Aug 15, 2025
2eae09b
[moe training] add memory bandwidth calculations to kernel benchmarki…
danielvegamyhre Aug 15, 2025
9192799
mx_formats: make emulated tests pass on H100, and add to CI (#2773)
vkuzo Aug 15, 2025
49cb18a
make e2e training benchmark support mx (#2776)
vkuzo Aug 15, 2025
d8bb51f
Update mx_formats README.md (#2777)
vkuzo Aug 15, 2025
0347f35
Convert model inference test from pytest to unittest (#2644)
namgyu-youn Aug 15, 2025
9e3758d
Update mx README.md (#2778)
vkuzo Aug 15, 2025
69e71d9
Update benchmarking tool to run on local iPhones
navsud Aug 15, 2025
b40fd97
Int4 sparse marlin tensor (#2771)
liangel-02 Aug 15, 2025
758f744
[CI] fix 4xH100 tests by not installing vllm (#2780)
danielvegamyhre Aug 16, 2025
e6b38bb
Fix setup develop (#2748)
metascroy Aug 18, 2025
42ad7e3
[mx] fix build warning for mxfp8 dim1 cast CUDA kernel (#2782)
danielvegamyhre Aug 18, 2025
24f11f8
Remove group_size arg in Float8DynamicActivationInt4WeightConfig (#2779)
jerryzh168 Aug 18, 2025
751d7f6
fixing torchao rocm ci test (#2789)
liangel-02 Aug 18, 2025
4463b79
nvfp4 tensor: switch to using `qdata` (#2787)
vkuzo Aug 18, 2025
c120bb7
nvfp4 tensor: switch to TorchAOBaseTensor (#2788)
vkuzo Aug 18, 2025
5c0d6a3
nvfp4 tensor: refactor weight-only vs dynamic quant (#2790)
vkuzo Aug 18, 2025
72b35bf
Add IntxUnpackedTensor (#2732)
metascroy Aug 19, 2025
9473060
Fix batch norm folding in `prepare_pt2e` for multiple conv->BN chains…
subhankarpal Aug 19, 2025
083361b
turn float8 inference kernel check test back on (#2808)
vkuzo Aug 19, 2025
af2cf1e
Initial torchao model release script (#2810)
jerryzh168 Aug 20, 2025
249d95b
mxtensor: make data argument first and rename to `qdata` (#2804)
vkuzo Aug 20, 2025
1a20585
mxtensor: inherit from TorchAOBaseTensor (#2805)
vkuzo Aug 20, 2025
fee314b
mxtensor: refactor activation quant to use direct logic (#2806)
vkuzo Aug 20, 2025
fbe08c3
improve fp8 blockwise gemm perf (#2784)
danielvegamyhre Aug 20, 2025
43b4106
Add load and run tests for checkpoints that we want to have BC (#2792)
jerryzh168 Aug 20, 2025
481a8ab
Add model release CI job (#2813)
jerryzh168 Aug 20, 2025
8812365
Fix autoquant tests failed due to changes to benchmark_gpu (#2818)
jerryzh168 Aug 20, 2025
44f6fc2
float8tensor: small fixes for kernel_preference (#2817)
vkuzo Aug 21, 2025
b6435f9
add simple roofline for float inference with rowwise scaling (#2819)
vkuzo Aug 21, 2025
706937b
Move codebook (LUT) generation methods into common utils. Update func…
szyszyzys Aug 21, 2025
db29394
Fix internal CI after the adding load_and_run_checkpoint test (#2836)
jerryzh168 Aug 21, 2025
abffabb
Add test for lut based embedding quantization.
szyszyzys Aug 21, 2025
e72b22e
Bitpack add functions for Uint8
szyszyzys Aug 21, 2025
f8887fa
Add LUT based embedding quantization,
szyszyzys Aug 21, 2025
9e83024
Add test function for lut based embedding
szyszyzys Aug 21, 2025
df7bf37
Add the ops for groupwise lut quantization for embeding
szyszyzys Aug 21, 2025
1aabda0
mx: delete `triton_f4_to_bf16` kernel (#2830)
vkuzo Aug 22, 2025
d37dcb7
mx: delete `use_fp4_custom_triton_dequant_kernel` option (#2831)
vkuzo Aug 22, 2025
6bbf091
[mxfp8 moe] add compile test; add mxfp8 to bench script (#2835)
danielvegamyhre Aug 22, 2025
b663faf
[mxfp8 moe] replace per group scaling with conventional scaling (#2841)
danielvegamyhre Aug 22, 2025
e7251df
float8 kernel test: make more robust (#2847)
vkuzo Aug 22, 2025
a5e31e2
Revert "Add the ops for groupwise lut quantization for embeding" (#2850)
vkuzo Aug 22, 2025
a9ffa50
Refactor TorchAOBaseTensor for better BC support (#2793)
jerryzh168 Aug 22, 2025
07fbc89
fix incorrect torch version test (#2786)
namgyu-youn Aug 22, 2025
253d65a
Revert "Refactor TorchAOBaseTensor for better BC support" (#2854)
jerryzh168 Aug 22, 2025
0596713
Fix float8 + int4 QAT (#2851)
andrewor14 Aug 22, 2025
9978bca
Remove TORCH_VERSION_AT_LEAST* warnings when importing torch (#2852)
andrewor14 Aug 22, 2025
2fd06de
Fix NVFP4 to_copy (#2812)
andrewor14 Aug 22, 2025
8079abc
Fix test_nvfp4_tensor.py merge conflict (#2857)
andrewor14 Aug 22, 2025
27f4d75
Fix autoquant after version util changes (#2858)
jerryzh168 Aug 22, 2025
7f7f626
Add lut quantized embedding.
szyszyzys Aug 23, 2025
98e406d
Add test for lut based embedding quantization.
szyszyzys Aug 23, 2025
bc2c83e
[reland] Refactor TorchAOBaseTensor for better BC (#2793) (#2855)
jerryzh168 Aug 23, 2025
f3e549c
Add NVFP4 QAT (#2666)
andrewor14 Aug 25, 2025
f03a737
bump version to 0.14.0 (#2872)
vkuzo Aug 25, 2025
8537883
[mxfp8 moe training] Add mxfp8 to FSDP tests (#2849)
liangel-02 Aug 25, 2025
72222d1
Fix test tolerance (#2871)
metascroy Aug 25, 2025
c93bc7d
add mxfp8 to test_tp (#2870)
liangel-02 Aug 25, 2025
34eaaf0
Add OPAQUE packing format (#2878)
jerryzh168 Aug 26, 2025
ba111b0
Fix UT assertion error for int8 sdpa fusion (#2816)
Valentine233 Aug 26, 2025
9f1e32b
release notes script: keep not user facing rows (#2875)
vkuzo Aug 26, 2025
891bd21
Update IntxUnpackedTensor to support dynamic activation (#2861)
metascroy Aug 26, 2025
23f8a22
TorchAOBaseTensor `__tensor_flatten__` and `__tensor_unflatten__` use…
jerryzh168 Aug 26, 2025
6a6a672
fix ci import error (#2876)
liangel-02 Aug 26, 2025
d321a2c
Conditional ROCm kernel build (#2839)
petrex Aug 26, 2025
9056c46
Enable quantizing local checkpoints in model release script (#2859)
jerryzh168 Aug 26, 2025
6f035e8
[CPU] Introduce Int4OpaqueTensor to replace Int4CPULayout in AQT (#2798)
Xia-Weiwen Aug 27, 2025
8722c0c
[moe fp8 training] test and bench new faster method for per group row…
danielvegamyhre Aug 27, 2025
e2514dd
[moe fp8 training] use transpose method when quantizing to avoid unco…
danielvegamyhre Aug 27, 2025
8669213
Introduce IntxOpaqueTensor to replace PackedInt8DynamicActivationIntx…
metascroy Aug 27, 2025
15a6de6
[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm (#2848)
danielvegamyhre Aug 27, 2025
a2f42cb
Update AWQ implementation to not use extra wrapper tensor subclass (#…
jerryzh168 Aug 27, 2025
8b2bc46
[moe fp8 training] fused reduction kernel along dim1 for 3d expert we…
danielvegamyhre Aug 27, 2025
615a374
integrate torch._scaled_mm into Float8BlockwiseLinear and add bench s…
danielvegamyhre Aug 27, 2025
a8db3a6
use shared bench + profile utils in blockwise fwd bwd bench script (#…
danielvegamyhre Aug 27, 2025
3bf21d0
[fp8 blockwise] load 2d chunks for groupwise quant to enable coalesce…
danielvegamyhre Aug 27, 2025
6e9bf26
Support QAT int4 v1 path for BC (#2888)
andrewor14 Aug 28, 2025
2a53216
[CPU][float8] Add scaled_embedding_bag kernel (#2686)
LevelDownRefine Aug 28, 2025
3de18af
exclude libcudart.so.13 from auditwheel repair to fix CUDA 13.0 wheel…
danielvegamyhre Aug 28, 2025
364ad47
[fp8 blockwise] wrap triton quantization kernels in custom ops for to…
danielvegamyhre Aug 28, 2025
f940738
[mxfp8 moe training] refactor all var names with suffix _mx to _data …
danielvegamyhre Aug 28, 2025
f0cca99
[mxfp8 moe training] add grouped gemm benchmark script (#2882)
danielvegamyhre Aug 28, 2025
2f78cfe
Rename `to_float8` to `from_hp` (#2893)
jerryzh168 Aug 28, 2025
4ecc89e
[mxfp8 moe training] add per group blocked scale kernels (#2886)
danielvegamyhre Aug 28, 2025
4236656
safetensors support (#2881)
liangel-02 Aug 29, 2025
f02354d
Add tracking for new tensors, AQT and layouts (#2895)
jerryzh168 Aug 29, 2025
3a9b8d1
Port metadata from the linear node onto the reference custom op for i…
kimishpatel Aug 29, 2025
6176322
Add Int4TilePackedTo4dTensor (#2791)
jerryzh168 Aug 29, 2025
83a20c7
[mxfp8 moe training] add triton kernel for blocked swizzled 3d weight…
danielvegamyhre Aug 29, 2025
08b1591
Add AWQ-INT4 option to release script (#2906)
jerryzh168 Aug 29, 2025
7ea5410
torchao init: do not load .so files for known incompatible torch vers…
vkuzo Aug 29, 2025
fbe3df9
Fix Float8Tensor quantize op kernrel preference dispatch (#2883)
jerryzh168 Aug 29, 2025
083d0c3
[mxfp8 moe training] use dim1 cast cuda kernel in bwd (#2897)
danielvegamyhre Aug 29, 2025
1bb1a40
Remove unused cpp variable, breaking style checks (#2909)
andrewor14 Aug 29, 2025
568c193
[moe training] update tests + benchmarks with conditional runs based …
danielvegamyhre Aug 29, 2025
ffabe80
[Intel GPU][doc] Change x86 quantizer to xpu quantizer in doc (#2916)
ZhiweiYan-96 Sep 1, 2025
1bb14f8
fix torchao version check on torch version (#2918)
vkuzo Sep 2, 2025
f5a1d08
change missing ops printout back to debug (#2921)
vkuzo Sep 2, 2025
266f749
another fix for torch version (#2922)
vkuzo Sep 2, 2025
71bfccb
SpinQuant rotate bias (#2913)
rohansjoshi Sep 2, 2025
f9f197e
[fp8 moe training] improve 3d quant kernel perf via removing annotati…
danielvegamyhre Sep 2, 2025
183068e
Update README.md with link to version compatibility matrix (#2920)
vkuzo Sep 2, 2025
bc52aa7
Added SpinQuant rotation unit test (#2925)
rohansjoshi Sep 2, 2025
8555713
Exclude libcuda.so from auditwheel replair (#2927)
atalman Sep 2, 2025
870284f
[pt2e] Avoid getting model device once per node (#2695)
andrewor14 Sep 3, 2025
4700fe8
Update README.md for mx_formats build from source (#2934)
vkuzo Sep 3, 2025
f35ae41
better check for mxfp8 cuda kernel presence (#2933)
vkuzo Sep 3, 2025
8776967
Set seed in numeric tests to make them more reliable (#2924)
metascroy Sep 3, 2025
9d01b43
Add Int4PlainInt32Tensor (#2845)
liangan1 Sep 4, 2025
aff141e
Move CPU kernels out of experimental
metascroy Sep 4, 2025
b34c103
Remove unused attributes in Float8Tensor (#2935)
jerryzh168 Sep 4, 2025
7b81460
[safetensors enablement] refactoring for huggingface integration (#2936)
liangel-02 Sep 5, 2025
1502748
Move top-level CPU kernels to csrc/cpu/aten_kernels
metascroy Sep 5, 2025
2c18dcb
Fix xnnpack export (#2941)
metascroy Sep 5, 2025
2dacd7f
Add hqq support for Int4TilePackedTo4dTensor (#2912)
jerryzh168 Sep 5, 2025
e7b310b
Float8Tensor per row quantization pass bias to fbgemm kernel (#2884)
jerryzh168 Sep 5, 2025
8901ff2
Update model script for INT8-INT4 (#2945)
metascroy Sep 5, 2025
4872c4f
Move packing format used by int4 to int4_packing_format.py (#2946)
jerryzh168 Sep 5, 2025
f1acc1e
Move packing format to intx folder (#2910)
metascroy Sep 6, 2025
73672aa
Delete copy of quantized SDPA in torchao/experimental
metascroy Sep 6, 2025
439b738
Add eval scripts for memory, latency and quality (#2943)
jerryzh168 Sep 7, 2025
a2206e9
Improve QAT fp8-int4 numerics (#2937)
andrewor14 Sep 8, 2025
e368b61
Skip QAT int4 v2 test for fbcode (#2923)
andrewor14 Sep 8, 2025
a54417d
Skip expanding scales for rowwise fp8 quantize (#2950)
andrewor14 Sep 8, 2025
2ccab32
Fix ROCM QAT test failure (#2957)
andrewor14 Sep 8, 2025
c452495
Add version=1 for calls to int4 weight only config (#2958)
jerryzh168 Sep 8, 2025
ac5ab7e
[Intel GPU] Enable llama generate.py + add unit test for quantization…
agrabow Sep 8, 2025
3760978
IntxWeightOnlyConfig/Int8DynamicIntxWeightConfig v2 migration: use ve…
metascroy Sep 9, 2025
f32431e
Use defaults CUDAExtension linker option when building mxfp8_cuda (#2…
huydhn Sep 9, 2025
861f971
Refactor Wanda for better readability (#2538)
namgyu-youn Sep 9, 2025
8b72284
[safetensor enablement] add fn to check if metadata is torchao (#2944)
liangel-02 Sep 9, 2025
b10876b
Bump `Int4WeightOnlyConfig` version to 2 (#2949)
jerryzh168 Sep 9, 2025
ecb6c4b
Remove compute target from intx_opaque_tensor (#2960)
metascroy Sep 9, 2025
10ba659
Skip QAT tests using `quantize_fp8_row` in fbcode (#2963)
andrewor14 Sep 9, 2025
ef4d0e1
Move some experimental tests (#2965)
metascroy Sep 9, 2025
d3efa39
docs: fix link in quantization overview documentation (#2962)
orangeH25 Sep 9, 2025
d35c2ce
Add support for only update models and push to a different user ID (#…
jerryzh168 Sep 9, 2025
851e2e6
Updated `update_model_card` to `poplulate_model_card_template` (#2970)
jerryzh168 Sep 10, 2025
2cb799b
[CPU] Support int8 scaled embedding bag (#2938)
LevelDownRefine Sep 10, 2025
0df571a
Move intx configs to version 2 by default (#2968)
metascroy Sep 10, 2025
b99904b
Experimental folder deprecation part 2/x (#2951)
metascroy Sep 10, 2025
83e8e60
Revert "[CPU] Support int8 scaled embedding bag" (#2974)
metascroy Sep 10, 2025
186aeb0
Update latency test script due to deprecation in vllm (#2973)
jerryzh168 Sep 10, 2025
cc35151
Make SmoothQuant more General (#2728)
namgyu-youn Sep 11, 2025
14ca521
[mxfp8 moe training] per group scale conversion to blocked format wit…
danielvegamyhre Sep 11, 2025
481be64
Add torchao_convert to PARQ's QuantOptimizer (#2947)
lisjin Sep 11, 2025
66384a9
[mxfp8 moe training] integrate mxfp8 grouped gemm and triton kernels …
danielvegamyhre Sep 11, 2025
be71434
[pt2e] Make prepare and convert faster by caching (#2983)
andrewor14 Sep 11, 2025
f1e118b
Add nvcc flags to explicitly build mxfp8 dim1 cast kernel for sm100a …
danielvegamyhre Sep 11, 2025
cffba61
Add from_int4_tensor in Int4PreshuffledTensor (#2978)
jerryzh168 Sep 11, 2025
011027c
Update ExecuTorch instructions in the model release template (#2975)
metascroy Sep 12, 2025
93030e7
Enable using HF PARQ checkpoints in torchao (#2985)
metascroy Sep 12, 2025
c4d4799
[CPU][FP8] Support FP8 SDPA for CPU backend (#2689)
Valentine233 Sep 12, 2025
f9bc52d
Updates LUT tensor and new convert API (#2984)
metascroy Sep 12, 2025
cc65dc5
hf integration doc page (#2899)
liangel-02 Sep 12, 2025
045c959
Fix FX Graph Cache issue in register_da8w4_concat_linear_cpu_pass (#2…
Sep 12, 2025
e3d9720
Replace `torch.norm` with `torch.linalg.vector_norm` (#2660)
namgyu-youn Sep 14, 2025
56ae935
Deprecate experimental part 3/x (#2976)
metascroy Sep 14, 2025
ea8c00f
Improve QAT int4 weight-only numerics (#2986)
andrewor14 Sep 15, 2025
264fd38
QAT configs
metascroy Sep 15, 2025
4dffb40
Add intx_opaque_tensor and tied embeddings to _convert_model_for_aarc…
metascroy Sep 15, 2025
9a770a5
Fix parametrized tests
metascroy Sep 16, 2025
58c3064
Rename Int4WeightPreshuffledFakeQuantizeConfig (#3005)
andrewor14 Sep 16, 2025
067b273
Support Int4OpaqueTensor for AWQ (#2997)
Sep 17, 2025
62f62d0
Deprecate config functions like `int4_weight_only` (#2994)
andrewor14 Sep 17, 2025
9e5059e
Remove internal usage of all config functions like `int4_weight_only`…
andrewor14 Sep 17, 2025
afe5cab
[mxfp8 moe training] add compile support (#2990)
danielvegamyhre Sep 17, 2025
ff3ba31
[mxfp8 moe training] use dim1 cast cuda kernel for 3d weights by resh…
danielvegamyhre Sep 17, 2025
f75b251
[moe training] add benchmarks for dsv3 236b, 671b shapes; reorganize …
danielvegamyhre Sep 17, 2025
c801f10
[sparse] Add in missing op support for FP8 Sparse (#3014)
jcaip Sep 17, 2025
122b307
Fix torchao_convert, remove StretchedAffineQuantizedTensor (#3015)
lisjin Sep 18, 2025
18dbe87
update compile arg for llama3.sh bench script (#3006)
danielvegamyhre Sep 18, 2025
ae204cc
Remove FbgemmConfig and remaining Fbgemm tensors (#3032)
jerryzh168 Sep 19, 2025
cfa39c8
Support PLAIN_INT32 for AWQ on Intel GPU (#3019)
xiaowangintel Sep 19, 2025
a951643
Add main tensor conversion API for packed tensors (#3029)
jerryzh168 Sep 19, 2025
1591603
Support Int4OpaqueTensor for HQQ (#3028)
cyxlily Sep 19, 2025
f210443
[mxfp8 moe training] add CUDA kernel to quantize 3d tensor colwise (#…
danielvegamyhre Sep 19, 2025
4bf39b0
[mxfp8 moe training] wrap 3d quantize tensor in custom ops and integr…
danielvegamyhre Sep 19, 2025
f35dcd7
[mxfp8 moe training] remove mxfp8_gemms.py (#3033)
danielvegamyhre Sep 19, 2025
ae12e42
Pass QAT learned qparams in convert (#3022)
andrewor14 Sep 19, 2025
d2fae7a
[mxfp8 moe training] update 3d quant colwise scaling kernel to use si…
danielvegamyhre Sep 20, 2025
22819f4
[Bug fix][CPU] Fix fp8 sdpa compiling issue with latest PyTorch (#2991)
Valentine233 Sep 21, 2025
8525185
[Float8] add non-decomposed version of quantize/dequantize ops for fp…
LevelDownRefine Sep 21, 2025
db46a18
merge main
LevelDownRefine Sep 22, 2025
3d3f8cf
Merge branch 'main' into wengshiy/qlinear
LevelDownRefine Sep 22, 2025
9d88c16
[mxfp8 moe training] use new 3d colwise quantization kernel (#3037)
danielvegamyhre Sep 22, 2025
bc72e1c
Update deprecated parameters in Hugging Face library (#2982)
namgyu-youn Sep 22, 2025
fb7c837
Avoid normalization layers in HF's quantization_config (#3030)
lisjin Sep 22, 2025
be4203e
Misc fixes for release scripts to make it easier to use (#3036)
jerryzh168 Sep 22, 2025
eadead5
Minor fix on TAO op to support lowering
RandySheriff Sep 22, 2025
4e7afcb
change to use non-decomposed q/dq
LevelDownRefine Sep 23, 2025
d79c3cc
Merge remote-tracking branch 'refs/remotes/origin/wengshiy/qlinear' i…
LevelDownRefine Sep 23, 2025
e417a4e
fix lint
LevelDownRefine Sep 23, 2025
c23e286
add version check
LevelDownRefine Sep 23, 2025
77da321
change version
LevelDownRefine Sep 23, 2025
8e2ca35
Unify get_block_size (#3039)
Xia-Weiwen Sep 23, 2025
7ffc616
fix attention bug; update ut
LevelDownRefine Sep 24, 2025
f1bbf13
Merge remote-tracking branch 'origin/main' into wengshiy/qlinear
LevelDownRefine Sep 24, 2025
4fb5f7a
add liftup oplist
LevelDownRefine Sep 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .github/pytorch-probot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ ciflow_push_tags:
- ciflow/benchmark
- ciflow/tutorials
- ciflow/rocm
- ciflow/4xh100
142 changes: 142 additions & 0 deletions .github/scripts/torchao_model_releases/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# Scripts for torchao Model Release and Eval

Note: all commands below should be run in directory: `.github/scripts/torchao_model_releases/`

## Frequently Used Commands
### Release and Eval Scripts for New Model Releases
```
MODEL=Qwen/Qwen3-8B
# Releasing all models: INT4, INT8, INT8-INT4
sh release.sh --model_id $MODEL --push_to_hub --populate_model_card_template

# INT8-INT4 requires additional steps to export and run so it's skipped from
# general eval here
# Need to set QMODEL_PREFIX properly before running eval
# QMODEL_PREFIX=pytorch/Qwen3-8B
sh eval.sh --model_ids $MODEL "$QMODEL_PREFIX-FP8" "$QMODEL_PREFIX-INT4"

# Some follow up evals
sh eval.sh --eval_type latency --batch_size 256 "$QMODEL_PREFIX-FP8"
sh eval.sh --eval_type quality --batch_size 256 "$QMODEL_PREFIX-INT8-INT4"

# Summarize all results
sh summarize_results.sh --model_ids $MODEL "$QMODEL_PREFIX-FP8" "$QMODEL_PREFIX-INT4" "$QMODEL_PREFIX-INT8-INT4" "$QMODEL_PREFIX-AWQ-INT4"
```

### AWQ Release and Eval
```
MODEL=Qwen/Qwen3-8B
TASK=mmlu_abstract_algebra
python quantize_and_upload.py --model_id $MODEL --quant AWQ-INT4 --push_to_hub --task $TASK --calibration_limit 10 --populate_model_card_template
sh eval.sh --model_ids $MODEL "$QMODEL_PREFIX-AWQ-INT4"
```

### Update Released Checkpoints in PyTorch
Sometimes we may have to update the checkpoints under a different user name (organization) without changing the model card, e.g. for INT4
```
MODEL=Qwen/Qwen3-8B
sh release.sh --model $MODEL --quants INT4 --push_to_hub --push_to_user_id pytorch
```

Or AWQ checkpoint:
```
MODEL=Qwen/Qwen3-8B
TASK=mmlu_abstract_algebra
python quantize_and_upload.py --model_id $MODEL --quant AWQ-INT4--task $TASK --calibration_limit 10 --push_to_hub --push_to_user_id pytorch
```

## Release Scripts
### default options
By default, we release FP8, INT4, INT8-INT4 checkpoints, with model card pre-filled with template content, that can be modified later after we have eval results.

Examples:
```
# Note: first login with `huggingface-cli login`, the quantized model will be uploaded to
# the logged in user

# release with default quant options (FP8, INT4, INT8-INT4)
./release.sh --model_id Qwen/Qwen3-8B --push_to_hub

# release a custom set of quant options
./release.sh --model_id Qwen/Qwen3-8B --quants INT4 FP8 --push_to_hub
```

Note: for initial release, please include `--populate_model_card_template` to populate model card template.

### AWQ-INT4
[AWQ](https://arxiv.org/abs/2306.00978) is a technique to improve accuracy for weight only quantization. It improves accuracy by preserving "salient" weight channels that has high impact on the accuracy of output, through multiplying the weight channel by a scale, and do the reverse for the correspnoding activation, since activation is not quantized, there is no additional loss from activation, while the quantization loss from weight can be reduced.

After eval for INT4 checkpoint is done, we might find some task have a large accuracy drop compared to high precision baseline, in that case we can do a calibration for that task, with a few samples, tasks are selected from [lm-eval](https://github.com/EleutherAI/lm-eval\uation-harness/blob/main/lm_eval/tasks/README.md). You can follow [new task guide](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/new_task_guide.md) to add new tasks to lm-eval.

Examples:
```
# release AWQ-INT4 model, calibrated with a specific task
# with some calibration_limit (number of samples)
python quantize_and_upload.py --model_id Qwen/Qwen3-8B --quant AWQ-INT4 --push_to_hub --task bbh --calibration_limit 2
```

### Update checkpoints for a different user_id (e.g. pytorch)
Sometimes we may want to update the checkpoints for a different user id, without changing model card. For this we can use `--push_to_user_id`, e.g.

```
sh release.sh --model_id microsoft/Phi-4-mini-instruct --quants FP8 --push_to_hub --push_to_user_id pytorch
```

This will update `pytorch/Phi-4-mini-instruct-FP8` without changing the model card.

## Eval Scripts
After we run the release script for a model, we can find new models in the huggingface hub page for the user, e.g. https://huggingface.co/torchao-testing, the models will have a model card that's filled in with template content, such as information about the model and eval instructions, there are a few things we need to fill in, including 1. peak memory usage, 2. latency when running model with vllm and 3. quality measurement using lm-eval.

### Single Script
The simplest is just to run all three evals. Please check out `Run Single Evals` section to make sure the environment is setup correctly. This includes:
1. install [vllm](https://github.com/vllm-project/vllm) from source and set `VLLM_DIR` to the soruce directory of vllm
2. install [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness)

```
sh eval.sh --eval_type all --model_ids Qwen/Qwen3-8B pytorch/Qwen3-8B-INT4
```

If `eval_type` is all, we'll also run summarize results for the list of `model_ids`, summarized results will be found in files: `summary_results_Qwen_Qwen3-8B.log` and `summary_results_pytorch_Qwen3-8B-INT4.log`.

Then we can fill in the blanks in the model cards of uploaded checkpoints.

### Separate Scripts
#### Memory Eval
```
sh eval.sh --eval_type memory --model_ids Qwen/Qwen3-8B
```

#### Latency Eval
For latency eval, make sure vllm is installed.
```
uv pip install vllm
```

Or install vllm nightly:
```
uv pip install vllm --pre --extra-index-url https://download.pytorch.org/whl/nightly/cu126
```

After environment is setup, we can run eval:
```
sh eval.sh --eval_type latency --model_ids Qwen/Qwen3-8B --batch_sizes 1,256
```

#### Model Quality Eval
For model quality eval, we need to install lm-eval
```
uv pip install lm-eval
```
After environment is setup, we can run eval:
```
sh eval.sh --eval_type quality --model_ids Qwen/Qwen3-8B --tasks hellaswag,mmlu
```

#### Summarize results
After we have finished all evals for each model, we can summarize the results with:
```
sh summarize_results.sh --model_ids Qwen/Qwen3-8B pytorch/Qwen3-8B-INT4
```
Summarized results files for above command: `summary_results_Qwen_Qwen3-8B.log` and `summary_results_pytorch_Qwen3-8B-INT4.log`

It will look through the current directory to find all the result files from memory, latency and quality evals and combine all the result information into a single file.
114 changes: 114 additions & 0 deletions .github/scripts/torchao_model_releases/eval.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD 3-Clause license found in the
# LICENSE file in the root directory of this source tree.

#!/bin/bash
set -e
source eval_env_checks.sh

usage() {
echo "Usage: $0 --model_ids <model1> <model2> ... [--eval_type <all|memory|latency|quality>] [--batch_sizes <batch_sizes>] [--tasks <tasks>]"
echo "Defaults:"
echo " batch_sizes: 1 256"
echo " tasks: mmlu"
exit 1
}
MODEL_ID_ARRAY=()
EVAL_TYPE="all"
# these will be parsed in the other scripts
BATCH_SIZES="1 256" # Default for latency eval
TASKS="mmlu" # Default for quality eval
# Parse arguments
while [[ $# -gt 0 ]]; do
case "$1" in
--eval_type)
shift
if [[ $# -eq 0 ]]; then
echo "Error: --eval_type requires a value"
exit 1
fi
EVAL_TYPE="$1"
shift
;;
--model_ids)
shift
# Collect all subsequent arguments that are not another flag
while [[ $# -gt 0 && ! "$1" =~ ^-- ]]; do
MODEL_ID_ARRAY+=("$1")
shift
done
;;
--batch_sizes)
shift
if [[ $# -eq 0 ]]; then
echo "Error: --batch_sizes requires a value"
exit 1
fi
BATCH_SIZES="$1"
shift
;;
--tasks)
shift
if [[ $# -eq 0 ]]; then
echo "Error: --tasks requires a value"
exit 1
fi
TASKS="$1"
shift
;;
*)
echo "Unknown argument: $1"
usage
;;
esac
done
if [[ ${#MODEL_ID_ARRAY[@]} -eq 0 ]]; then
echo "Error: --model_ids is required"
usage
fi

run_memory() {
check_torch
local model_id="$1"
sh eval_memory.sh --model_ids "$model_id"
}
run_latency() {
check_vllm
local model_id="$1"
sh eval_latency.sh --model_ids "$model_id" --batch_sizes $BATCH_SIZES
}
run_quality() {
check_lm_eval
local model_id="$1"
sh eval_quality.sh --model_ids "$model_id" --tasks $TASKS
}
for MODEL_ID in "${MODEL_ID_ARRAY[@]}"; do
case "$EVAL_TYPE" in
memory)
run_memory "$MODEL_ID"
;;
latency)
run_latency "$MODEL_ID"
;;
quality)
run_quality "$MODEL_ID"
;;
all)
run_quality "$MODEL_ID"
run_memory "$MODEL_ID"
run_latency "$MODEL_ID"
;;
*)
echo "Unknown eval_type: $EVAL_TYPE"
echo "Valid types are: all, memory, latency, quality"
exit 2
;;
esac
done

# Run summarize_results.sh with MODEL_IDS if eval_type is "all"
if [[ "$EVAL_TYPE" == "all" ]]; then
sh summarize_results.sh --model_ids "${MODEL_ID_ARRAY[@]}"
fi
26 changes: 26 additions & 0 deletions .github/scripts/torchao_model_releases/eval_env_checks.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD 3-Clause license found in the
# LICENSE file in the root directory of this source tree.

check_torch() {
if ! pip show torch > /dev/null 2>&1; then
echo "Error: torch package is NOT installed. please install with `pip install torch`" >&2
exit 1
fi
}

check_vllm() {
if ! pip show vllm > /dev/null 2>&1; then
echo "Error: vllm package is NOT installed. please install with `pip install vllm`" >&2
exit 1
fi
}

check_lm_eval() {
if ! pip show lm_eval > /dev/null 2>&1; then
echo "Error: lm_eval package is NOT installed. please install with `pip install lm_eval`" >&2
exit 1
fi
}
85 changes: 85 additions & 0 deletions .github/scripts/torchao_model_releases/eval_latency.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD 3-Clause license found in the
# LICENSE file in the root directory of this source tree.

#!/bin/bash
set -e
source eval_env_checks.sh
check_vllm

MODEL_ID_ARRAY=()
BATCH_SIZE_ARRAY=(1) # default can be overwritten by user input
INPUT_LEN="256" # default input length
OUTPUT_LEN="256" # default output length
# Parse arguments
while [[ $# -gt 0 ]]; do
case "$1" in
--model_ids)
shift
# Collect all subsequent arguments that are not another flag
while [[ $# -gt 0 && ! "$1" =~ ^-- ]]; do
MODEL_ID_ARRAY+=("$1")
shift
done
;;
--batch_sizes)
shift
BATCH_SIZE_ARRAY=()
# Collect all subsequent arguments that are not another flag
while [[ $# -gt 0 && ! "$1" =~ ^-- ]]; do
BATCH_SIZE_ARRAY+=("$1")
shift
done
;;
--input_len)
shift
if [[ $# -eq 0 ]]; then
echo "Error: --input_len requires a value"
exit 1
fi
INPUT_LEN="$1"
shift
;;
--output_len)
shift
if [[ $# -eq 0 ]]; then
echo "Error: --output_len requires a value"
exit 1
fi
OUTPUT_LEN="$1"
shift
;;
*)
echo "Unknown argument: $1"
echo "Usage: $0 --model_id <model_id> [--batch_sizes <batch_sizes>] [--input_len <input_len>] [--output_len <output_len>]"
exit 1
;;
esac
done
if [[ ${#MODEL_ID_ARRAY[@]} -eq 0 ]]; then
echo "Error: --model_ids is required"
echo "Usage: $0 --model_ids <model_id1> <model_id2> ... [--batch_sizes <batch_size1> <batch_size2> ...] [--input_len <input_len>] [--output_len <output_len>]"
exit 1
fi
# Save the original directory
ORIG_DIR="$(pwd)"
# cd to VLLM_DIR
cd $VLLM_DIR
for MODEL_ID in "${MODEL_ID_ARRAY[@]}"; do
echo "======================== Eval Latency $MODEL_ID ==========================="
# Replace all '/' with '_'
SAFE_MODEL_ID="${MODEL_ID//\//_}"
# Loop over batch sizes and print (replace with your eval command)
for BATCH_SIZE in "${BATCH_SIZE_ARRAY[@]}"; do
OUTPUT_FILE="$ORIG_DIR/${SAFE_MODEL_ID}_latency_batch${BATCH_SIZE}_in${INPUT_LEN}_out${OUTPUT_LEN}.log"
echo "Running latency eval for model $MODEL_ID with batch size $BATCH_SIZE with input length: $INPUT_LEN and output length: $OUTPUT_LEN"
VLLM_DISABLE_COMPILE_CACHE=1 vllm bench latency --input-len $INPUT_LEN --output-len $OUTPUT_LEN --model $MODEL_ID --batch-size $BATCH_SIZE > "$OUTPUT_FILE" 2>&1
echo "Latency eval result saved to $OUTPUT_FILE"
done
echo "======================== Eval Latency $MODEL_ID End ========================="
done

# cd back to original place
cd $ORIG_DIR
Loading
Loading