Add _throw_dmrs device override for reshape of views#3095
Open
Abdelrahman912 wants to merge 2 commits intoJuliaGPU:masterfrom
Open
Add _throw_dmrs device override for reshape of views#3095Abdelrahman912 wants to merge 2 commits intoJuliaGPU:masterfrom
_throw_dmrs device override for reshape of views#3095Abdelrahman912 wants to merge 2 commits intoJuliaGPU:masterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3095 +/- ##
==========================================
- Coverage 90.43% 90.42% -0.01%
==========================================
Files 141 141
Lines 12025 12025
==========================================
- Hits 10875 10874 -1
- Misses 1150 1151 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 774cffc | Previous: 6ccd4b4 | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
101843 ns |
101723 ns |
1.00 |
array/accumulate/Float32/dims=1 |
77138 ns |
76608 ns |
1.01 |
array/accumulate/Float32/dims=1L |
1585554 ns |
1585294 ns |
1.00 |
array/accumulate/Float32/dims=2 |
144144 ns |
143948 ns |
1.00 |
array/accumulate/Float32/dims=2L |
657967.5 ns |
657945 ns |
1.00 |
array/accumulate/Int64/1d |
118678 ns |
118967 ns |
1.00 |
array/accumulate/Int64/dims=1 |
80133 ns |
79956 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1706215 ns |
1694445 ns |
1.01 |
array/accumulate/Int64/dims=2 |
156239.5 ns |
156040 ns |
1.00 |
array/accumulate/Int64/dims=2L |
962068 ns |
961840 ns |
1.00 |
array/broadcast |
20708 ns |
20347 ns |
1.02 |
array/construct |
1330.4 ns |
1311.9 ns |
1.01 |
array/copy |
19016 ns |
18931 ns |
1.00 |
array/copyto!/cpu_to_gpu |
215947 ns |
215113 ns |
1.00 |
array/copyto!/gpu_to_cpu |
284326 ns |
283517 ns |
1.00 |
array/copyto!/gpu_to_gpu |
11431.5 ns |
11647 ns |
0.98 |
array/iteration/findall/bool |
131568 ns |
132615 ns |
0.99 |
array/iteration/findall/int |
149234 ns |
149623 ns |
1.00 |
array/iteration/findfirst/bool |
81883.5 ns |
82175 ns |
1.00 |
array/iteration/findfirst/int |
83535.5 ns |
84437 ns |
0.99 |
array/iteration/findmin/1d |
89031.5 ns |
87647 ns |
1.02 |
array/iteration/findmin/2d |
117635 ns |
117309 ns |
1.00 |
array/iteration/logical |
200232.5 ns |
203627.5 ns |
0.98 |
array/iteration/scalar |
67840 ns |
68729 ns |
0.99 |
array/permutedims/2d |
52486 ns |
52820 ns |
0.99 |
array/permutedims/3d |
53326 ns |
52914 ns |
1.01 |
array/permutedims/4d |
52208 ns |
51983 ns |
1.00 |
array/random/rand/Float32 |
13239 ns |
13104 ns |
1.01 |
array/random/rand/Int64 |
37304 ns |
37312 ns |
1.00 |
array/random/rand!/Float32 |
8615 ns |
8603.333333333334 ns |
1.00 |
array/random/rand!/Int64 |
34462 ns |
34156 ns |
1.01 |
array/random/randn/Float32 |
44189.5 ns |
38723.5 ns |
1.14 |
array/random/randn!/Float32 |
31115 ns |
31520 ns |
0.99 |
array/reductions/mapreduce/Float32/1d |
34760 ns |
35427 ns |
0.98 |
array/reductions/mapreduce/Float32/dims=1 |
49713 ns |
49562 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
52089 ns |
51766 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2 |
57176 ns |
56838 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2L |
70359 ns |
69604 ns |
1.01 |
array/reductions/mapreduce/Int64/1d |
43204 ns |
43423 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1 |
42820 ns |
44694.5 ns |
0.96 |
array/reductions/mapreduce/Int64/dims=1L |
87995 ns |
87805 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
59706 ns |
60051.5 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=2L |
85032 ns |
85186 ns |
1.00 |
array/reductions/reduce/Float32/1d |
35128 ns |
35458 ns |
0.99 |
array/reductions/reduce/Float32/dims=1 |
40488 ns |
46307.5 ns |
0.87 |
array/reductions/reduce/Float32/dims=1L |
52249 ns |
52046 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
57013 ns |
57117 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
69997 ns |
70127.5 ns |
1.00 |
array/reductions/reduce/Int64/1d |
43669 ns |
41220 ns |
1.06 |
array/reductions/reduce/Int64/dims=1 |
44428.5 ns |
51895 ns |
0.86 |
array/reductions/reduce/Int64/dims=1L |
87860 ns |
87739 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
59799 ns |
59630 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
84792 ns |
84743.5 ns |
1.00 |
array/reverse/1d |
18598 ns |
18349 ns |
1.01 |
array/reverse/1dL |
69158 ns |
68960 ns |
1.00 |
array/reverse/1dL_inplace |
65919 ns |
65909 ns |
1.00 |
array/reverse/1d_inplace |
10278 ns |
8540.833333333332 ns |
1.20 |
array/reverse/2d |
20844 ns |
20881 ns |
1.00 |
array/reverse/2dL |
72918.5 ns |
72996 ns |
1.00 |
array/reverse/2dL_inplace |
65969 ns |
65926 ns |
1.00 |
array/reverse/2d_inplace |
11211 ns |
10076 ns |
1.11 |
array/sorting/1d |
2733057 ns |
2735188.5 ns |
1.00 |
array/sorting/2d |
1074904 ns |
1069206 ns |
1.01 |
array/sorting/by |
3301743 ns |
3304125.5 ns |
1.00 |
cuda/synchronization/context/auto |
1167.8 ns |
1176.2 ns |
0.99 |
cuda/synchronization/context/blocking |
921.0277777777778 ns |
924.5869565217391 ns |
1.00 |
cuda/synchronization/context/nonblocking |
8133.8 ns |
6942.1 ns |
1.17 |
cuda/synchronization/stream/auto |
997.7333333333333 ns |
999.9375 ns |
1.00 |
cuda/synchronization/stream/blocking |
802.5567010309278 ns |
787.7961165048544 ns |
1.02 |
cuda/synchronization/stream/nonblocking |
7348.2 ns |
7168.6 ns |
1.03 |
integration/byval/reference |
144066 ns |
143982 ns |
1.00 |
integration/byval/slices=1 |
146060 ns |
145868 ns |
1.00 |
integration/byval/slices=2 |
284760 ns |
284528 ns |
1.00 |
integration/byval/slices=3 |
423409 ns |
422970 ns |
1.00 |
integration/cudadevrt |
102664 ns |
102612 ns |
1.00 |
integration/volumerhs |
9442040.5 ns |
9440461 ns |
1.00 |
kernel/indexing |
13355 ns |
13181 ns |
1.01 |
kernel/indexing_checked |
14001 ns |
14081 ns |
0.99 |
kernel/launch |
2113 ns |
2150.777777777778 ns |
0.98 |
kernel/occupancy |
670.566037735849 ns |
672 ns |
1.00 |
kernel/rand |
14516 ns |
14396 ns |
1.01 |
latency/import |
3809768988.5 ns |
3814290062.5 ns |
1.00 |
latency/precompile |
4591409363 ns |
4590207670.5 ns |
1.00 |
latency/ttfp |
4386579833 ns |
4409319020 ns |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
reshape(@view(data[1:n*n]), (n, n))fails to compile on the GPU.@viewcreates aSubArray, which has no specializedreshapemethod on the device, so it falls back to Base's generic_reshape. That path calls_throw_dmrs, which tries to construct aDimensionMismatchstring — unsupported on the GPU.