Skip to content

feat: add some of the kernels using cuda.compute#3981

Merged
ianna merged 32 commits into
scikit-hep:awkward3from
maxymnaumchyk:maxymnaumchyk/3978-awkward_indexedarray_overlay_mask
May 19, 2026
Merged

feat: add some of the kernels using cuda.compute#3981
ianna merged 32 commits into
scikit-hep:awkward3from
maxymnaumchyk:maxymnaumchyk/3978-awkward_indexedarray_overlay_mask

Conversation

@maxymnaumchyk
Copy link
Copy Markdown
Collaborator

@maxymnaumchyk maxymnaumchyk commented Apr 17, 2026

Closes #3978, closes #3997, closes #3998, closes #3999, closes #4000, closes #4001, closes #4002, closes #4003, closes #4004, closes #4005, closes #4006, closes #4007

These kernels are only ~2 times faster-->
IndexedArray_reduce_next_nonlocal_nextshifts_64 kernel before:
Screenshot 2026-04-16 135451

IndexedArray_reduce_next_nonlocal_nextshifts_64 kernel after:
image

IndexedArray_reduce_next_64 kernel before:
image

IndexedArray_reduce_next_64 kernel after:
image

IndexedArray_overlay_mask kernel before:
image

IndexedArray_overlay_mask kernel after:
image

@github-actions
Copy link
Copy Markdown

The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR3981

@maxymnaumchyk
Copy link
Copy Markdown
Collaborator Author

IndexedArray_reduce_next_64 kernel before:
Screenshot 2026-04-15 163801

IndexedArray_reduce_next_64 kernel after:
Screenshot 2026-04-15 163511

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 17, 2026

Codecov Report

❌ Patch coverage is 43.53448% with 131 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.04%. Comparing base (6f6d816) to head (a7ac8a7).

Files with missing lines Patch % Lines
src/awkward/_connect/cuda/_compute.py 43.53% 131 Missing ⚠️

❌ Your patch check has failed because the patch coverage (43.53%) is below the target coverage (98.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
Files with missing lines Coverage Δ
src/awkward/_backends/cupy.py 100.00% <ø> (ø)
src/awkward/_connect/cuda/_compute.py 57.47% <43.53%> (-11.15%) ⬇️

@maxymnaumchyk maxymnaumchyk marked this pull request as ready for review April 17, 2026 13:56
@maxymnaumchyk maxymnaumchyk marked this pull request as draft April 17, 2026 13:57
@maxymnaumchyk
Copy link
Copy Markdown
Collaborator Author

ByteMaskedArray_getitem_nextcarry kernel before:
image

ByteMaskedArray_getitem_nextcarry kernel after:
Screenshot 2026-04-17 175330

@maxymnaumchyk
Copy link
Copy Markdown
Collaborator Author

awkward_ByteMaskedArray_numnull kernel before:
Screenshot 2026-04-17 175310

awkward_ByteMaskedArray_numnull kernel after:
image

@maxymnaumchyk maxymnaumchyk changed the title feat: add some of the awkward_IndexedArray kernels using cuda.compute feat: add some of the awkward_IndexedArray and ByteMaskedArray kernels using cuda.compute Apr 17, 2026
@maxymnaumchyk
Copy link
Copy Markdown
Collaborator Author

awkward_RegularArray_getitem_jagged_expand kernel before:
image

awkward_RegularArray_getitem_jagged_expand kernel after:
image

@maxymnaumchyk
Copy link
Copy Markdown
Collaborator Author

awkward_UnionArray_simplify_one kernel before:
Screenshot 2026-04-21 171211

awkward_UnionArray_simplify_one kernel after:
image

@maxymnaumchyk
Copy link
Copy Markdown
Collaborator Author

awkward_ListArray_broadcast_tooffsets kernel before:
Screenshot 2026-04-23 175153

awkward_ListArray_broadcast_tooffsets kernel after:
image

@maxymnaumchyk
Copy link
Copy Markdown
Collaborator Author

awkward_ListArray_localindex kernel before:
Screenshot 2026-04-27 144334

awkward_ListArray_localindex kernel after:
image

@maxymnaumchyk
Copy link
Copy Markdown
Collaborator Author

awkward_ListArray_compact_offsets kernel before:
Screenshot 2026-04-28 122425

awkward_ListArray_compact_offsets kernel after:
image

@maxymnaumchyk
Copy link
Copy Markdown
Collaborator Author

awkward_ListArray_combinations_length kernel before:
Screenshot 2026-04-28 130635

awkward_ListArray_combinations_length kernel after:
Screenshot 2026-04-28 172829

@maxymnaumchyk
Copy link
Copy Markdown
Collaborator Author

maxymnaumchyk commented May 4, 2026

awkward_ListArray_combinations kernel before:
Screenshot 2026-04-28 180225

awkward_ListArray_combinations kernel after:
image

Just in case, I also manually tested the new kernel with n = 3 and replacement = True.

Also, there is a possible optimization that can be done -- to use the calculated offsets directly from awkward_ListArray_combinations_length (now, they are being discarded).

@maxymnaumchyk
Copy link
Copy Markdown
Collaborator Author

The awkward_UnionArray_nestedfill_tags_index kernel turned out to be a little slower: 0.006263 vs 0.006439 seconds. Add just for archive.

@maxymnaumchyk maxymnaumchyk changed the title feat: add some of the awkward_IndexedArray and ByteMaskedArray kernels using cuda.compute feat: add some of the kernels using cuda.compute May 7, 2026
@maxymnaumchyk maxymnaumchyk marked this pull request as ready for review May 7, 2026 09:46
@ianna ianna changed the base branch from main to awkward3 May 16, 2026 19:45
Copy link
Copy Markdown
Member

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxymnaumchyk - Thanks! 12 more kernels migrated to cuda.compute! I'll enable auto-merge. The benchmarks will be updated after it is merged later today. Thanks.

@ianna ianna merged commit 85e945d into scikit-hep:awkward3 May 19, 2026
35 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

awkward_IndexedArray_overlay_mask.cu

2 participants