[cuda backend] replace floor_div with float_div#20000
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20000
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 Unclassified FailuresAs of commit fc846cf with merge base eeb0646 ( UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
1ceb79c to
017c363
Compare
floor_div with float_div
| logger = logging.getLogger(__name__) | ||
|
|
||
| # Integer dtypes we rewrite. float64 (53-bit mantissa) is exact for | ||
| # |value| < 2**53, which covers these models' index ranges. |
There was a problem hiding this comment.
add a note for the large int risky.
| # Work around a torch-2.12 AOTInductor/Inductor CUDA miscompile of integer | ||
| # (int64) floor-division: fused/broadcast int64 floor_divide is mis-lowered | ||
| # (truncation instead of floor; cross-division term bleed under dynamic shapes). | ||
| # Rewriting into a float64-domain floor lowers correctly. Upstream issue: TODO(link). |
There was a problem hiding this comment.
Are you fixing the upstream issue (LINK missing) as well?
There was a problem hiding this comment.
no i have upstream the issue and waiting pytorch people to solve it.
lol that's a long and tedious story 😂 |
36d0635 to
fc846cf
Compare
After pin bump to pytorch 2.12, we noticed that `floor_div` with tensor as divisor [can not be correctly compiled by AOT Inductor,](pytorch/pytorch#186164) leading to cuda-backend-delegated model output irrevalant with input (e.g. gemma4-31b). To mitigate the issue, this PR replaces `floor_div` with `float_div` to support the models we need.
After pin bump to pytorch 2.12, we noticed that
floor_divwith tensor as divisor can not be correctly compiled by AOT Inductor, leading to cuda-backend-delegated model output irrevalant with input (e.g. gemma4-31b).To mitigate the issue, this PR replaces
floor_divwithfloat_divto support the models we need.