Conversation
|
Your PR no longer requires formatting changes. Thank you for your contribution! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #744 +/- ##
==========================================
+ Coverage 82.01% 82.28% +0.27%
==========================================
Files 62 62
Lines 2874 2874
==========================================
+ Hits 2357 2365 +8
+ Misses 517 509 -8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Metal Benchmarks
Details
| Benchmark suite | Current: 7a3fb2c | Previous: 1d2f000 | Ratio |
|---|---|---|---|
latency/precompile |
25740449000 ns |
25549419083 ns |
1.01 |
latency/ttfp |
2383024000 ns |
2346831687.5 ns |
1.02 |
latency/import |
1450508000 ns |
1427666042 ns |
1.02 |
integration/metaldevrt |
836084 ns |
877750 ns |
0.95 |
integration/byval/slices=1 |
1575688 ns |
1568625 ns |
1.00 |
integration/byval/slices=3 |
21063916.5 ns |
8402792 ns |
2.51 |
integration/byval/reference |
1570583 ns |
1559958 ns |
1.01 |
integration/byval/slices=2 |
2714000 ns |
2629875 ns |
1.03 |
kernel/indexing |
465791 ns |
627417 ns |
0.74 |
kernel/indexing_checked |
480333 ns |
608750 ns |
0.79 |
kernel/launch |
11583 ns |
12667 ns |
0.91 |
kernel/rand |
533000 ns |
576167 ns |
0.93 |
array/construct |
6292 ns |
6500 ns |
0.97 |
array/broadcast |
522459 ns |
606708 ns |
0.86 |
array/random/randn/Float32 |
1015541 ns |
1011104 ns |
1.00 |
array/random/randn!/Float32 |
708625 ns |
753875 ns |
0.94 |
array/random/rand!/Int64 |
540125 ns |
548708 ns |
0.98 |
array/random/rand!/Float32 |
535750 ns |
586208.5 ns |
0.91 |
array/random/rand/Int64 |
883896 ns |
789709 ns |
1.12 |
array/random/rand/Float32 |
805375 ns |
645000 ns |
1.25 |
array/accumulate/Int64/1d |
1301354 ns |
1260667 ns |
1.03 |
array/accumulate/Int64/dims=1 |
1843583 ns |
1859104.5 ns |
0.99 |
array/accumulate/Int64/dims=2 |
2228875 ns |
2179083 ns |
1.02 |
array/accumulate/Int64/dims=1L |
12088292 ns |
11673271 ns |
1.04 |
array/accumulate/Int64/dims=2L |
10061833 ns |
9628146 ns |
1.05 |
array/accumulate/Float32/1d |
1066000 ns |
1121395.5 ns |
0.95 |
array/accumulate/Float32/dims=1 |
1577708.5 ns |
1571667 ns |
1.00 |
array/accumulate/Float32/dims=2 |
2003166 ns |
1889459 ns |
1.06 |
array/accumulate/Float32/dims=1L |
10307833 ns |
9834209 ns |
1.05 |
array/accumulate/Float32/dims=2L |
7442125 ns |
7249666.5 ns |
1.03 |
array/reductions/reduce/Int64/1d |
1292750 ns |
1386875 ns |
0.93 |
array/reductions/reduce/Int64/dims=1 |
1116375 ns |
1117250 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
1153167 ns |
1152958 ns |
1.00 |
array/reductions/reduce/Int64/dims=1L |
2039291 ns |
2013209 ns |
1.01 |
array/reductions/reduce/Int64/dims=2L |
3941000 ns |
4244083 ns |
0.93 |
array/reductions/reduce/Float32/1d |
751020.5 ns |
988750 ns |
0.76 |
array/reductions/reduce/Float32/dims=1 |
806667 ns |
843520.5 ns |
0.96 |
array/reductions/reduce/Float32/dims=2 |
836000 ns |
857917 ns |
0.97 |
array/reductions/reduce/Float32/dims=1L |
1331604 ns |
1326625 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
1811333.5 ns |
1810667 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
1311125 ns |
1356437.5 ns |
0.97 |
array/reductions/mapreduce/Int64/dims=1 |
1111917 ns |
1102166.5 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=2 |
1156874.5 ns |
1149750 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1L |
1924146 ns |
1988375 ns |
0.97 |
array/reductions/mapreduce/Int64/dims=2L |
3639125 ns |
3626916 ns |
1.00 |
array/reductions/mapreduce/Float32/1d |
786917 ns |
1055917 ns |
0.75 |
array/reductions/mapreduce/Float32/dims=1 |
799125 ns |
847396 ns |
0.94 |
array/reductions/mapreduce/Float32/dims=2 |
841750 ns |
860979.5 ns |
0.98 |
array/reductions/mapreduce/Float32/dims=1L |
1326000 ns |
1333042 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=2L |
1808708 ns |
1898125 ns |
0.95 |
array/private/copyto!/gpu_to_gpu |
550083 ns |
633020.5 ns |
0.87 |
array/private/copyto!/cpu_to_gpu |
702291.5 ns |
804354.5 ns |
0.87 |
array/private/copyto!/gpu_to_cpu |
688459 ns |
816000 ns |
0.84 |
array/private/iteration/findall/int |
1560208.5 ns |
1581312.5 ns |
0.99 |
array/private/iteration/findall/bool |
1463542 ns |
1404916.5 ns |
1.04 |
array/private/iteration/findfirst/int |
2075208 ns |
2075167 ns |
1.00 |
array/private/iteration/findfirst/bool |
2009584 ns |
2048750 ns |
0.98 |
array/private/iteration/scalar |
3491687.5 ns |
4526479 ns |
0.77 |
array/private/iteration/logical |
2644208.5 ns |
2693625 ns |
0.98 |
array/private/iteration/findmin/1d |
2523459 ns |
2518041 ns |
1.00 |
array/private/iteration/findmin/2d |
1842229 ns |
1820229.5 ns |
1.01 |
array/private/copy |
817604.5 ns |
568854 ns |
1.44 |
array/shared/copyto!/gpu_to_gpu |
84792 ns |
84291 ns |
1.01 |
array/shared/copyto!/cpu_to_gpu |
82875 ns |
82875 ns |
1 |
array/shared/copyto!/gpu_to_cpu |
82687.5 ns |
83000 ns |
1.00 |
array/shared/iteration/findall/int |
1565458 ns |
1585854.5 ns |
0.99 |
array/shared/iteration/findall/bool |
1471437.5 ns |
1421875 ns |
1.03 |
array/shared/iteration/findfirst/int |
1701708 ns |
1654709 ns |
1.03 |
array/shared/iteration/findfirst/bool |
1629083 ns |
1643542 ns |
0.99 |
array/shared/iteration/scalar |
201542 ns |
210375 ns |
0.96 |
array/shared/iteration/logical |
2363459 ns |
2297959 ns |
1.03 |
array/shared/iteration/findmin/1d |
2166125 ns |
2134229 ns |
1.01 |
array/shared/iteration/findmin/2d |
1833666 ns |
1806042 ns |
1.02 |
array/shared/copy |
215333 ns |
241812 ns |
0.89 |
array/permutedims/4d |
2478834 ns |
2395583 ns |
1.03 |
array/permutedims/2d |
1187187.5 ns |
1158833 ns |
1.02 |
array/permutedims/3d |
1768084 ns |
1686541 ns |
1.05 |
metal/synchronization/stream |
19125 ns |
19583 ns |
0.98 |
metal/synchronization/context |
19708 ns |
20291 ns |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
|
From Section 6.9.2 of the Metal Shading Language Specification:
instead of Would you mind renaming the device functions to |
|
Hmm I think I got these mixed up, which might be the reason my tests turned out different than what I expected. I will revisit this. |
Depends on JuliaGPU/GPUCompiler.jl#766