Skip to content

[cuda backend] replace floor_div with float_div#20000

Merged
Gasoonjia merged 4 commits into
mainfrom
g4-after-pin-bump
Jun 4, 2026
Merged

[cuda backend] replace floor_div with float_div#20000
Gasoonjia merged 4 commits into
mainfrom
g4-after-pin-bump

Conversation

@Gasoonjia

@Gasoonjia Gasoonjia commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

After pin bump to pytorch 2.12, we noticed that floor_div with tensor as divisor can not be correctly compiled by AOT Inductor, leading to cuda-backend-delegated model output irrevalant with input (e.g. gemma4-31b).

To mitigate the issue, this PR replaces floor_div with float_div to support the models we need.

@pytorch-bot

pytorch-bot Bot commented Jun 3, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20000

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Unclassified Failures

As of commit fc846cf with merge base eeb0646 (image):

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 3, 2026
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@Gasoonjia Gasoonjia force-pushed the g4-after-pin-bump branch from 1ceb79c to 017c363 Compare June 4, 2026 00:46
@linux-foundation-easycla

linux-foundation-easycla Bot commented Jun 4, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

@Gasoonjia Gasoonjia changed the title [wip] gemm4 support in pt 2.12 [cuda backend] replace floor_div with float_div Jun 4, 2026
@Gasoonjia Gasoonjia marked this pull request as ready for review June 4, 2026 01:05
Comment thread backends/cuda/passes/replace_int64_floordiv.py
logger = logging.getLogger(__name__)

# Integer dtypes we rewrite. float64 (53-bit mantissa) is exact for
# |value| < 2**53, which covers these models' index ranges.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't cover int64.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a note for the large int risky.

# Work around a torch-2.12 AOTInductor/Inductor CUDA miscompile of integer
# (int64) floor-division: fused/broadcast int64 floor_divide is mis-lowered
# (truncation instead of floor; cross-division term bleed under dynamic shapes).
# Rewriting into a float64-domain floor lowers correctly. Upstream issue: TODO(link).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you fixing the upstream issue (LINK missing) as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no i have upstream the issue and waiting pytorch people to solve it.

@digantdesai digantdesai left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping to unblock. Thanks, curious how did you find this?

@Gasoonjia

Copy link
Copy Markdown
Contributor Author

Stamping to unblock. Thanks, curious how did you find this?

lol that's a long and tedious story 😂

@Gasoonjia Gasoonjia force-pushed the g4-after-pin-bump branch from 36d0635 to fc846cf Compare June 4, 2026 06:08
@Gasoonjia Gasoonjia merged commit ac3003e into main Jun 4, 2026
288 of 290 checks passed
@Gasoonjia Gasoonjia deleted the g4-after-pin-bump branch June 4, 2026 18:42
Gasoonjia added a commit that referenced this pull request Jun 8, 2026
After pin bump to pytorch 2.12, we noticed that `floor_div` with tensor
as divisor [can not be correctly compiled by AOT
Inductor,](pytorch/pytorch#186164) leading to
cuda-backend-delegated model output irrevalant with input (e.g.
gemma4-31b).

To mitigate the issue, this PR replaces `floor_div` with `float_div` to
support the models we need.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/cuda CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants