Conversation
* next step in kernel migration from parents to offsets * linter fix * add jax bincount * add cpp kernel to convert parents to offsets * fix typo * fix typo in an auto-generated test * almost there * nearly there * cleanup * fix remaining kernels * final cleanup * format json data * format * initialize maxnextparents = -1 (a sentinel meaning "no bin was touched") * update the kernels to work on offsets! * format * migrate cupy rawkernels from parents to offsets * migrate jax reducers * fix for platfroms where the int64 counts can't be safely cast * add bincount for cupy backend * compact loc * fix windows build and add optional OpenMP support * update cuda kernels * feat: add `missing_repeat` kernel implementation using cuda.compute (#3922) * feat: add missing_repeat implementation using cuda.compute * keep the same name as in cpu kernel * add tests for repetitions>1 and regularsize>1 * style: pre-commit fixes * add keywords * style fixes --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * feat: add `index_rpad_and_clip` kernels implementations using cuda.compute (#3923) * feat: add `index_rpad_and_clip` kernels implementation using cuda.compute * style fix * add a test for `index_rpad_and_clip_axis0` that would have target>length * add keyword names * style * Apply suggestions from Ianna Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch> * style: pre-commit fixes --------- Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: array attrs not being validated at creation and being of inconsistent type (#3996) * convert dict attrs to Attrs type * add test * fix: ensure jax backend uses arrays on cpu only (#3990) * ensure jax uses cpu by default * remove now redundant jax_platform_name setting * change error messages * better errors * remove DeviceArray mentions as that does not exist while we're at it too * almost there * cleanup * cleanup * remove old data file * update cupy kernels * remove exlicit test for depricated kernel * add complex kernels and port them to use offsets * fix complex reducers * add remaining kernels * add kernels for complex and bool sum * move to segmented_reduce * fix typo * promote type * fix complex bool reducer * try another algo * missed one * use type inference * make numba happy * remove reducer overloads * use sum op func for complex types * remove test of depricated code * avoid using == to compare floating-point products * fix boundary tests * cleanup * cleanup * remove dead code * revert lexsort to argmin/max * handrolled lexsort * remove dead code * remove bincount * import numba cuda for jit * fix tests --------- Co-authored-by: maxymnaumchyk <maxymnaumchyk@gmail.com> Co-authored-by: maxymnaumchyk <70752300+maxymnaumchyk@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Iason Krommydas <iason.krom@gmail.com>
Codecov Report❌ Patch coverage is Additional details and impacted files
|
|
@ariostas should definitely take a look too (when he's back to wor). I do not fully trust myself with c++ yet but can definitely review the python part changes. |
|
One question I have is how are the kernel test data generated? I'm not familiar with this part. |
yes, we can trust the data because these are the data from the tests. |
|
I've ran this against HiggsDNA and getting the same results for my analysis. Only small floating point differences that |
Thanks @ikrommyd ! Floating point comparison is indeed a tricky one. |
|
Yeah totally reasonable. Floating point ops are not commutative. Anyways I'll test this out a bit more with analyses examples and I also wanna if there's any noticeable performance hit. |
Thanks! Don’t worry about the performance yet — that’s the next on my list. First we must insure that the results are correct.👍 |
|
The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR4056 |
add benchmark script
* feat: add `awkward_IndexedArray_overlay_mask` kernel using cuda.compute * feat: add `awkward_IndexedArray_reduce_next_64` kernel using cuda.compute * feat: add `IndexedArray_reduce_next_nonlocal_nextshifts_64` kernel using cuda.compute * feat: add `ByteMaskedArray_getitem_nextcarry` kernel using cuda.compute * feat: add `awkward_ByteMaskedArray_numnull` kernel using cuda.compute * feat: add `awkward_RegularArray_getitem_jagged_expand` kernel using cuda.compute * add an upper bound * style: pre-commit fixes * feat: add `awkward_RegularArray_getitem_jagged_expand` kernel using cuda.compute * feat: add `awkward_UnionArray_simplify_one` kernel using cuda.compute * feat: add `awkward_ListArray_broadcast_tooffsets` kernel using cuda.compute * feat: add `awkward_ListArray_localindex` kernel using cuda.compute * fix the impl * feat: add `awkward_ListArray_compact_offsets` kernel using cuda.compute * feat: add `awkward_ListArray_combinations_length` kernel using cuda.compute * feat: add `awkward_ListArray_combinations` kernel using cuda.compute * feat: add `awkward_UnionArray_nestedfill_tags_index` kernel using cuda.compute * fix the tests for kernels that are deliberately raising errors * compare `starts` and `stops` separately * ignore `memptr` argument for pylint * style: pre-commit fixes * Add functions for indexing and repeating arrays * style: pre-commit fixes * return unary_transform call for segment_ids * style: pre-commit fixes * update the `awkward_IndexedArray_reduce_next_64` to work with offsets * style: pre-commit fixes --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>
* feat: add reimplementation of the `awkward_RegularArray` kernels using cuda.compute * reimplement more kernels using cuda.compute * reimplement more kernels using cuda.compute * style: pre-commit fixes * style fix * use the named functions instead of lambdas * add new kernels * add new kernels * fix a type mismatch * style: pre-commit fixes * add new kernels * fix the overflow error * new batch of kernels (draft) * add comments and cleanup * fix the tests messages * add new `BitMaskedArray` kernels * modify the `test_1381_check_errors.py` test * resolve the conflicts --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
@ianna what's the plan with this PR? Is this something that you would want to merge soon or is it to keep like a live comparison of the "Awkward3" changes?
If this is going in |
The plan is to discuss it at AS meeting.
ASIS performance is good enough. And we already know the bottlenecks -- where we can optimize it -- I want to have confirmation from the users that validation passes. |
back port from
awkward3branch