Skip to content

splitTargetSizeInHalfGpu by data size if not target size#14684

Open
zpuller wants to merge 4 commits intoNVIDIA:mainfrom
zpuller:split_target
Open

splitTargetSizeInHalfGpu by data size if not target size#14684
zpuller wants to merge 4 commits intoNVIDIA:mainfrom
zpuller:split_target

Conversation

@zpuller
Copy link
Copy Markdown
Collaborator

@zpuller zpuller commented Apr 27, 2026

Fixes #14054.

Description

splitTargetSizeInHalfGpu is used as the split policy in both the GPU shuffle coalesce path and the join gather path (AbstractGpuJoinIterator). When OOM occurs, it halves targetSize and retries — but if the actual data is already smaller than targetSize/2, this is a no-op and the retry will OOM again. The shuffle coalesce path had an element-count fallback for this case; the join path did not.

This change moves the fallback logic into the split policy so it can be handled in a consistent way for both shuffle coalesce and join.

In the join case, we cannot discretely divide by elements in the same way as we don't have the full collection ahead of time, so we take a different approach that still accomplishes approximately the same goal; if the target size / 2 is > the total estimated data size, when the split would have previously failed/been a no-op, we instead choose the new target to be the estimated total data size / 2 instead of target size / 2.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Please provide the names of the existing tests in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

zpuller added 3 commits April 27, 2026 11:00
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
Signed-off-by: Zach Puller <zpuller@nvidia.com>
@zpuller zpuller marked this pull request as ready for review April 27, 2026 18:21
@zpuller
Copy link
Copy Markdown
Collaborator Author

zpuller commented Apr 27, 2026

build

@zpuller zpuller requested a review from a team April 27, 2026 18:22
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

This PR fixes a no-op OOM-retry bug in both the GPU shuffle coalesce and join gather paths by adding a dataSize hint to AutoCloseableTargetSize and using dataSize/2 as the split target when the actual data is already smaller than targetSize/2. The logic is centralized in splitTargetSizeInHalfInternal and the dataSize field is preserved across splits so the optimization can fire consistently.

Confidence Score: 4/5

Safe to merge; the core fix is correct and the two issues are edge-case regressions unlikely to surface in production.

All findings are P2. The main concern is the removed element-count fallback in createSplitPolicyByTargetSize: for zero-byte tables, splitIndex stays 0 and concat([]) crashes rather than producing a handled OOM. The Long.MaxValue loop-exit trick is a minor style smell. Neither affects the primary happy-path fix.

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuShuffleCoalesceExec.scala — specifically the createSplitPolicyByTargetSize split loop and the removed element-count fallback.

Important Files Changed

Filename Overview
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RmmRapidsRetryIterator.scala Adds dataSize field to AutoCloseableTargetSize and uses it in splitTargetSizeInHalfInternal to avoid no-op retries; dataSize is now forwarded to the result so the optimization can apply on subsequent splits.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/AbstractGpuJoinIterator.scala Moves targetSizeWrapper creation inside the gathererStore.map block so estimatedDataSize (rows × per-row estimate) is passed to the retry wrapper, enabling the new dataSize/2 optimization in the join gather path.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuShuffleCoalesceExec.scala Removes element-count fallback in createSplitPolicyByTargetSize, relying on dataSize/2 to guarantee a valid byte-split; two P2 issues: zero-byte tables leave splitIndex == 0 (crash on concat([])), and the Long.MaxValue sentinel relies on integer overflow semantics.
tests/src/test/scala/com/nvidia/spark/rapids/WithRetrySuite.scala Adds a targeted test for splitTargetSizeInHalfGpu using dataSize/2 when dataSize < targetSize/2; verifies the new split target is dataSize/2 = 100 and not targetSize/2 = 500.
tests/src/test/scala/com/nvidia/spark/rapids/GpuShuffleCoalesceSuite.scala Lowers minSplitSize from 1 KB to 1 byte in the forced-split-retry test to match smaller test data; comment explains the production default (10 MB).

Sequence Diagram

sequenceDiagram
    participant Caller
    participant withRetry
    participant splitPolicy
    participant splitTargetSizeInHalfInternal

    Caller->>withRetry: AutoCloseableTargetSize(targetSize, minSize, dataSize)
    withRetry->>Caller: attempt (targetSize)
    Caller-->>withRetry: GpuSplitAndRetryOOM
    withRetry->>splitPolicy: split(target)
    splitPolicy->>splitTargetSizeInHalfInternal: target
    alt dataSize > 0 AND dataSize <= targetSize/2
        splitTargetSizeInHalfInternal-->>splitPolicy: newTarget = dataSize/2
    else
        splitTargetSizeInHalfInternal-->>splitPolicy: newTarget = targetSize/2
    end
    splitPolicy-->>withRetry: AutoCloseableTargetSize(newTarget, minSize, dataSize)
    withRetry->>Caller: attempt (newTarget, smaller)
    Caller-->>withRetry: success
    withRetry-->>Caller: result
Loading

Reviews (2): Last reviewed commit: "pr comments" | Re-trigger Greptile

Comment thread sql-plugin/src/main/scala/com/nvidia/spark/rapids/RmmRapidsRetryIterator.scala Outdated
Signed-off-by: Zach Puller <zpuller@nvidia.com>
@zpuller
Copy link
Copy Markdown
Collaborator Author

zpuller commented Apr 27, 2026

build

@sameerz sameerz added the bug Something isn't working label Apr 28, 2026
@zpuller zpuller requested a review from abellina April 28, 2026 16:31
@zpuller zpuller changed the title splitTargetSizeInHalfGpu by elements if not bytes splitTargetSizeInHalfGpu by data size if not target size May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] splitTargetSizeInHalfGpu should split the sequence by elements if splitting by byte size is not possible

3 participants