Skip to content

Port to Python and libtorch stable ABI#102

Open
woct0rdho wants to merge 3 commits into
Dao-AILab:mainfrom
woct0rdho:abi3_stable
Open

Port to Python and libtorch stable ABI#102
woct0rdho wants to merge 3 commits into
Dao-AILab:mainfrom
woct0rdho:abi3_stable

Conversation

@woct0rdho
Copy link
Copy Markdown

@woct0rdho woct0rdho commented Mar 29, 2026

I've ported this repo to Python stable ABI (ABI3) and libtorch stable ABI. See https://docs.pytorch.org/tutorials/advanced/cpp_custom_ops.html for a modern guide of torch custom ops.

This means we no longer need to build a different wheel for every Python and PyTorch version. We only need to build different wheels for Windows/Linux and CUDA 12/CUDA 13/ROCm 7. It will help the adoption of this package, notably because Unsloth is recommending this package in Qwen3.5 training. The same porting is already done in packages like flash-attn-3.

I've built the wheel and run the unit tests on a machine with Windows, RTX 3080 (sm86), CUDA 13, torch 2.11, and another machine with Linux, Strix Halo (gfx1151), ROCm 7, torch 2.12 nightly.

A concern is that libtorch stable ABI only supports Python >= 3.10 and torch >= 2.10 . It's possible to make it support torch 2.9 with some extra effort. As you've dropped support for Pascal and Volta, maybe we can also drop support for torch < 2.10 . Or we can keep the old code without stable ABI in a legacy branch.

A notable change is that I moved the detection of deterministic mode from C++ to Python, because it's not in the stable C++ API. Also, os.getenv cannot be traced in torch.compile, so I detect it when the package is imported rather than when the function is called. I've updated the corresponding tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant