Skip to content

Homogenize default behavior of higher-dimensional similar to SparseArrays#3091

Open
alonsoC1s wants to merge 3 commits intoJuliaGPU:masterfrom
alonsoC1s:issue-3061
Open

Homogenize default behavior of higher-dimensional similar to SparseArrays#3091
alonsoC1s wants to merge 3 commits intoJuliaGPU:masterfrom
alonsoC1s:issue-3061

Conversation

@alonsoC1s
Copy link
Copy Markdown

As mentioned in #3061 , when calling similar with CUSPARSE arguments and dimensions higher than 2, the fallback CPU method is called. As a result, calls to similar with GPU (sparse) matrices sometimes return host arrays, causing scalar indexing issues down the line

This PR introduces a fallback to similar that creates un-initialized CuArrays when there are 3 or more dimensions given as arguments, which is similar to what Base and SparseArrays do

Important note: I wasn't able to fully test this locally due to some issues instantiating the repo environment with Pkg

@maleadt
Copy link
Copy Markdown
Member

maleadt commented Apr 10, 2026

Test failures related.

@alonsoC1s
Copy link
Copy Markdown
Author

@maleadt my bad. I think I got it this time

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 14792c6 Previous: 33ffdef Ratio
array/accumulate/Float32/1d 100706 ns 101156 ns 1.00
array/accumulate/Float32/dims=1 75725 ns 76494 ns 0.99
array/accumulate/Float32/dims=1L 1583091 ns 1583795 ns 1.00
array/accumulate/Float32/dims=2 142616 ns 143348 ns 0.99
array/accumulate/Float32/dims=2L 656324 ns 656949 ns 1.00
array/accumulate/Int64/1d 118300 ns 118389 ns 1.00
array/accumulate/Int64/dims=1 79226 ns 79916 ns 0.99
array/accumulate/Int64/dims=1L 1693681 ns 1694529 ns 1.00
array/accumulate/Int64/dims=2 155139 ns 155647 ns 1.00
array/accumulate/Int64/dims=2L 961219 ns 961265 ns 1.00
array/broadcast 20390 ns 20513 ns 0.99
array/construct 1314.1 ns 1333.8 ns 0.99
array/copy 18878 ns 18686 ns 1.01
array/copyto!/cpu_to_gpu 214371 ns 212224.5 ns 1.01
array/copyto!/gpu_to_cpu 281379.5 ns 282985 ns 0.99
array/copyto!/gpu_to_gpu 11296 ns 11526 ns 0.98
array/iteration/findall/bool 130863 ns 131867 ns 0.99
array/iteration/findall/int 147506 ns 148383 ns 0.99
array/iteration/findfirst/bool 80977.5 ns 81401 ns 0.99
array/iteration/findfirst/int 83244 ns 83519 ns 1.00
array/iteration/findmin/1d 86011 ns 83773.5 ns 1.03
array/iteration/findmin/2d 116904 ns 117018 ns 1.00
array/iteration/logical 199916 ns 197560.5 ns 1.01
array/iteration/scalar 66771 ns 68375 ns 0.98
array/permutedims/2d 52090 ns 52136 ns 1.00
array/permutedims/3d 52862.5 ns 52817.5 ns 1.00
array/permutedims/4d 51627.5 ns 52006 ns 0.99
array/random/rand/Float32 12917.5 ns 13083 ns 0.99
array/random/rand/Int64 29713.5 ns 37254 ns 0.80
array/random/rand!/Float32 8410.333333333334 ns 8673.666666666666 ns 0.97
array/random/rand!/Int64 34173 ns 33174 ns 1.03
array/random/randn/Float32 37260 ns 43918 ns 0.85
array/random/randn!/Float32 31469 ns 31437 ns 1.00
array/reductions/mapreduce/Float32/1d 34513 ns 35490 ns 0.97
array/reductions/mapreduce/Float32/dims=1 39573 ns 45840 ns 0.86
array/reductions/mapreduce/Float32/dims=1L 51814 ns 51786 ns 1.00
array/reductions/mapreduce/Float32/dims=2 56518.5 ns 56654 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 69201.5 ns 69672 ns 0.99
array/reductions/mapreduce/Int64/1d 42038 ns 43204 ns 0.97
array/reductions/mapreduce/Int64/dims=1 52644 ns 42387 ns 1.24
array/reductions/mapreduce/Int64/dims=1L 87591 ns 87636 ns 1.00
array/reductions/mapreduce/Int64/dims=2 59287 ns 59567 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 84827 ns 85154 ns 1.00
array/reductions/reduce/Float32/1d 34463 ns 35536 ns 0.97
array/reductions/reduce/Float32/dims=1 41870.5 ns 49541 ns 0.85
array/reductions/reduce/Float32/dims=1L 51679 ns 51867 ns 1.00
array/reductions/reduce/Float32/dims=2 56704 ns 56724 ns 1.00
array/reductions/reduce/Float32/dims=2L 69748 ns 70152 ns 0.99
array/reductions/reduce/Int64/1d 42140 ns 43180 ns 0.98
array/reductions/reduce/Int64/dims=1 43394 ns 46092.5 ns 0.94
array/reductions/reduce/Int64/dims=1L 87691 ns 87572 ns 1.00
array/reductions/reduce/Int64/dims=2 59396 ns 59560 ns 1.00
array/reductions/reduce/Int64/dims=2L 84284 ns 84946 ns 0.99
array/reverse/1d 18260 ns 18382 ns 0.99
array/reverse/1dL 68884 ns 68959.5 ns 1.00
array/reverse/1dL_inplace 65921 ns 65940 ns 1.00
array/reverse/1d_inplace 10293.166666666668 ns 8534 ns 1.21
array/reverse/2d 20591 ns 20498 ns 1.00
array/reverse/2dL 72539 ns 72575.5 ns 1.00
array/reverse/2dL_inplace 66040 ns 66219 ns 1.00
array/reverse/2d_inplace 10213 ns 10379 ns 0.98
array/sorting/1d 2735006 ns 2734628.5 ns 1.00
array/sorting/2d 1067915.5 ns 1068430 ns 1.00
array/sorting/by 3303800 ns 3303603 ns 1.00
cuda/synchronization/context/auto 1171.9 ns 1179.7 ns 0.99
cuda/synchronization/context/blocking 935.5588235294117 ns 917.5135135135135 ns 1.02
cuda/synchronization/context/nonblocking 7633.8 ns 7659 ns 1.00
cuda/synchronization/stream/auto 1021.5454545454545 ns 1056.923076923077 ns 0.97
cuda/synchronization/stream/blocking 835.6794871794872 ns 797.961038961039 ns 1.05
cuda/synchronization/stream/nonblocking 7529 ns 8313.1 ns 0.91
integration/byval/reference 144076 ns 144149 ns 1.00
integration/byval/slices=1 145859 ns 146090 ns 1.00
integration/byval/slices=2 284633 ns 284910 ns 1.00
integration/byval/slices=3 422972 ns 423319 ns 1.00
integration/cudadevrt 102614 ns 102707 ns 1.00
integration/volumerhs 9431917 ns 9433360.5 ns 1.00
kernel/indexing 13283 ns 13349 ns 1.00
kernel/indexing_checked 13981 ns 14195 ns 0.98
kernel/launch 2152.1111111111113 ns 2182.222222222222 ns 0.99
kernel/occupancy 671.6194968553459 ns 661.179012345679 ns 1.02
kernel/rand 14483 ns 14520 ns 1.00
latency/import 3816302972 ns 3811597161 ns 1.00
latency/precompile 4606963391 ns 4611897362.5 ns 1.00
latency/ttfp 4392541501.5 ns 4392419951 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants