Homogenize default behavior of higher-dimensional similar to SparseArrays#3091
Open
alonsoC1s wants to merge 3 commits intoJuliaGPU:masterfrom
Open
Homogenize default behavior of higher-dimensional similar to SparseArrays#3091alonsoC1s wants to merge 3 commits intoJuliaGPU:masterfrom
alonsoC1s wants to merge 3 commits intoJuliaGPU:masterfrom
Conversation
Member
|
Test failures related. |
Author
|
@maleadt my bad. I think I got it this time |
Contributor
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 14792c6 | Previous: 33ffdef | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
100706 ns |
101156 ns |
1.00 |
array/accumulate/Float32/dims=1 |
75725 ns |
76494 ns |
0.99 |
array/accumulate/Float32/dims=1L |
1583091 ns |
1583795 ns |
1.00 |
array/accumulate/Float32/dims=2 |
142616 ns |
143348 ns |
0.99 |
array/accumulate/Float32/dims=2L |
656324 ns |
656949 ns |
1.00 |
array/accumulate/Int64/1d |
118300 ns |
118389 ns |
1.00 |
array/accumulate/Int64/dims=1 |
79226 ns |
79916 ns |
0.99 |
array/accumulate/Int64/dims=1L |
1693681 ns |
1694529 ns |
1.00 |
array/accumulate/Int64/dims=2 |
155139 ns |
155647 ns |
1.00 |
array/accumulate/Int64/dims=2L |
961219 ns |
961265 ns |
1.00 |
array/broadcast |
20390 ns |
20513 ns |
0.99 |
array/construct |
1314.1 ns |
1333.8 ns |
0.99 |
array/copy |
18878 ns |
18686 ns |
1.01 |
array/copyto!/cpu_to_gpu |
214371 ns |
212224.5 ns |
1.01 |
array/copyto!/gpu_to_cpu |
281379.5 ns |
282985 ns |
0.99 |
array/copyto!/gpu_to_gpu |
11296 ns |
11526 ns |
0.98 |
array/iteration/findall/bool |
130863 ns |
131867 ns |
0.99 |
array/iteration/findall/int |
147506 ns |
148383 ns |
0.99 |
array/iteration/findfirst/bool |
80977.5 ns |
81401 ns |
0.99 |
array/iteration/findfirst/int |
83244 ns |
83519 ns |
1.00 |
array/iteration/findmin/1d |
86011 ns |
83773.5 ns |
1.03 |
array/iteration/findmin/2d |
116904 ns |
117018 ns |
1.00 |
array/iteration/logical |
199916 ns |
197560.5 ns |
1.01 |
array/iteration/scalar |
66771 ns |
68375 ns |
0.98 |
array/permutedims/2d |
52090 ns |
52136 ns |
1.00 |
array/permutedims/3d |
52862.5 ns |
52817.5 ns |
1.00 |
array/permutedims/4d |
51627.5 ns |
52006 ns |
0.99 |
array/random/rand/Float32 |
12917.5 ns |
13083 ns |
0.99 |
array/random/rand/Int64 |
29713.5 ns |
37254 ns |
0.80 |
array/random/rand!/Float32 |
8410.333333333334 ns |
8673.666666666666 ns |
0.97 |
array/random/rand!/Int64 |
34173 ns |
33174 ns |
1.03 |
array/random/randn/Float32 |
37260 ns |
43918 ns |
0.85 |
array/random/randn!/Float32 |
31469 ns |
31437 ns |
1.00 |
array/reductions/mapreduce/Float32/1d |
34513 ns |
35490 ns |
0.97 |
array/reductions/mapreduce/Float32/dims=1 |
39573 ns |
45840 ns |
0.86 |
array/reductions/mapreduce/Float32/dims=1L |
51814 ns |
51786 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
56518.5 ns |
56654 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
69201.5 ns |
69672 ns |
0.99 |
array/reductions/mapreduce/Int64/1d |
42038 ns |
43204 ns |
0.97 |
array/reductions/mapreduce/Int64/dims=1 |
52644 ns |
42387 ns |
1.24 |
array/reductions/mapreduce/Int64/dims=1L |
87591 ns |
87636 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
59287 ns |
59567 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
84827 ns |
85154 ns |
1.00 |
array/reductions/reduce/Float32/1d |
34463 ns |
35536 ns |
0.97 |
array/reductions/reduce/Float32/dims=1 |
41870.5 ns |
49541 ns |
0.85 |
array/reductions/reduce/Float32/dims=1L |
51679 ns |
51867 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
56704 ns |
56724 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
69748 ns |
70152 ns |
0.99 |
array/reductions/reduce/Int64/1d |
42140 ns |
43180 ns |
0.98 |
array/reductions/reduce/Int64/dims=1 |
43394 ns |
46092.5 ns |
0.94 |
array/reductions/reduce/Int64/dims=1L |
87691 ns |
87572 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
59396 ns |
59560 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
84284 ns |
84946 ns |
0.99 |
array/reverse/1d |
18260 ns |
18382 ns |
0.99 |
array/reverse/1dL |
68884 ns |
68959.5 ns |
1.00 |
array/reverse/1dL_inplace |
65921 ns |
65940 ns |
1.00 |
array/reverse/1d_inplace |
10293.166666666668 ns |
8534 ns |
1.21 |
array/reverse/2d |
20591 ns |
20498 ns |
1.00 |
array/reverse/2dL |
72539 ns |
72575.5 ns |
1.00 |
array/reverse/2dL_inplace |
66040 ns |
66219 ns |
1.00 |
array/reverse/2d_inplace |
10213 ns |
10379 ns |
0.98 |
array/sorting/1d |
2735006 ns |
2734628.5 ns |
1.00 |
array/sorting/2d |
1067915.5 ns |
1068430 ns |
1.00 |
array/sorting/by |
3303800 ns |
3303603 ns |
1.00 |
cuda/synchronization/context/auto |
1171.9 ns |
1179.7 ns |
0.99 |
cuda/synchronization/context/blocking |
935.5588235294117 ns |
917.5135135135135 ns |
1.02 |
cuda/synchronization/context/nonblocking |
7633.8 ns |
7659 ns |
1.00 |
cuda/synchronization/stream/auto |
1021.5454545454545 ns |
1056.923076923077 ns |
0.97 |
cuda/synchronization/stream/blocking |
835.6794871794872 ns |
797.961038961039 ns |
1.05 |
cuda/synchronization/stream/nonblocking |
7529 ns |
8313.1 ns |
0.91 |
integration/byval/reference |
144076 ns |
144149 ns |
1.00 |
integration/byval/slices=1 |
145859 ns |
146090 ns |
1.00 |
integration/byval/slices=2 |
284633 ns |
284910 ns |
1.00 |
integration/byval/slices=3 |
422972 ns |
423319 ns |
1.00 |
integration/cudadevrt |
102614 ns |
102707 ns |
1.00 |
integration/volumerhs |
9431917 ns |
9433360.5 ns |
1.00 |
kernel/indexing |
13283 ns |
13349 ns |
1.00 |
kernel/indexing_checked |
13981 ns |
14195 ns |
0.98 |
kernel/launch |
2152.1111111111113 ns |
2182.222222222222 ns |
0.99 |
kernel/occupancy |
671.6194968553459 ns |
661.179012345679 ns |
1.02 |
kernel/rand |
14483 ns |
14520 ns |
1.00 |
latency/import |
3816302972 ns |
3811597161 ns |
1.00 |
latency/precompile |
4606963391 ns |
4611897362.5 ns |
1.00 |
latency/ttfp |
4392541501.5 ns |
4392419951 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As mentioned in #3061 , when calling
similarwith CUSPARSE arguments and dimensions higher than 2, the fallback CPU method is called. As a result, calls tosimilarwith GPU (sparse) matrices sometimes return host arrays, causing scalar indexing issues down the lineThis PR introduces a fallback to
similarthat creates un-initializedCuArrays when there are 3 or more dimensions given as arguments, which is similar to what Base and SparseArrays doImportant note: I wasn't able to fully test this locally due to some issues instantiating the repo environment with Pkg