Skip to content

GPU accelerated solvers#492

Draft
ErikQQY wants to merge 9 commits into
masterfrom
qqy/gpu
Draft

GPU accelerated solvers#492
ErikQQY wants to merge 9 commits into
masterfrom
qqy/gpu

Conversation

@ErikQQY

@ErikQQY ErikQQY commented May 20, 2026

Copy link
Copy Markdown
Member

Some TODOs:

  • Finalize the utimate APIs
  • Support for other FIRK solvers
  • Fix NonlinearSolve compatiability
  • Fix SparseMatrixColorings compatiability
  • Fix LinearSolve compatiability

@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Benchmark Results

Click to check benchmark results
master 71eac75... master / 71eac75...
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK2() 0.645 ± 0.0094 s 0.645 ± 0.009 s 1 ± 0.02
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK3() 13.3 ± 1 ms 13.5 ± 1 ms 0.989 ± 0.11
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK4() 2.97 ± 0.53 ms 2.95 ± 0.54 ms 1.01 ± 0.26
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK5() 3.52 ± 0.63 ms 3.52 ± 0.64 ms 1 ± 0.26
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK6() 1.68 ± 0.35 ms 1.7 ± 0.35 ms 0.986 ± 0.29
Simple Pendulum/IIP/MultipleShooting(10, Tsit5; grid_coarsening = false) 1.69 ± 0.61 ms 1.68 ± 0.61 ms 1.01 ± 0.52
Simple Pendulum/IIP/MultipleShooting(10, Tsit5; grid_coarsening = true) 3.3 ± 1.1 ms 3.32 ± 1.1 ms 0.993 ± 0.47
Simple Pendulum/IIP/MultipleShooting(100, Tsit5; grid_coarsening = false) 0.0425 ± 0.0089 s 0.0427 ± 0.013 s 0.995 ± 0.36
Simple Pendulum/IIP/MultipleShooting(100, Tsit5; grid_coarsening = true) 0.0691 ± 0.021 s 0.0682 ± 0.023 s 1.01 ± 0.46
Simple Pendulum/IIP/Shooting(Tsit5()) 0.249 ± 0.087 ms 0.254 ± 0.086 ms 0.979 ± 0.47
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK2() 0.81 ± 0.021 s 0.821 ± 0.014 s 0.986 ± 0.031
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK3() 17.8 ± 6 ms 17.3 ± 6.3 ms 1.03 ± 0.51
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK4() 3.52 ± 0.3 ms 3.55 ± 0.2 ms 0.992 ± 0.1
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK5() 4.29 ± 0.36 ms 4.32 ± 0.35 ms 0.993 ± 0.11
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK6() 2.01 ± 0.47 ms 2.04 ± 0.49 ms 0.988 ± 0.33
Simple Pendulum/OOP/MultipleShooting(10, Tsit5; grid_coarsening = false) 3.88 ± 3.2 ms 3.99 ± 3.2 ms 0.973 ± 1.1
Simple Pendulum/OOP/MultipleShooting(10, Tsit5; grid_coarsening = true) 7.26 ± 6.8 ms 7.55 ± 6.4 ms 0.962 ± 1.2
Simple Pendulum/OOP/MultipleShooting(100, Tsit5; grid_coarsening = false) 0.103 ± 0.0097 s 0.115 ± 0.022 s 0.897 ± 0.19
Simple Pendulum/OOP/MultipleShooting(100, Tsit5; grid_coarsening = true) 0.156 ± 0.016 s 0.159 ± 0.012 s 0.982 ± 0.12
Simple Pendulum/OOP/Shooting(Tsit5()) 0.637 ± 0.11 ms 0.651 ± 0.14 ms 0.977 ± 0.27
time_to_load 6.27 ± 0.061 s 6.24 ± 0.043 s 1 ± 0.012
### Benchmark Plots A plot of the benchmark results has been uploaded as an artifact to the workflow run for this PR. Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Benchmark Results (Julia v1.11)

Time benchmarks
master 5e3bc7c... master / 5e3bc7c...
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK2() 0.664 ± 0.0096 s 0.655 ± 0.0068 s 1.01 ± 0.018
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK3() 13.3 ± 0.38 ms 13.3 ± 0.38 ms 0.997 ± 0.041
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK4() 2.84 ± 0.069 ms 2.83 ± 0.096 ms 1 ± 0.042
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK5() 3.51 ± 0.13 ms 3.53 ± 0.18 ms 0.993 ± 0.061
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK6() 1.68 ± 0.068 ms 1.68 ± 0.076 ms 0.998 ± 0.061
Simple Pendulum/IIP/MultipleShooting(10, Tsit5; grid_coarsening = false) 1.62 ± 0.11 ms 1.64 ± 0.11 ms 0.984 ± 0.095
Simple Pendulum/IIP/MultipleShooting(10, Tsit5; grid_coarsening = true) 3.24 ± 0.18 ms 3.31 ± 0.19 ms 0.978 ± 0.078
Simple Pendulum/IIP/MultipleShooting(100, Tsit5; grid_coarsening = false) 0.043 ± 0.0041 s 0.0429 ± 0.0035 s 1 ± 0.13
Simple Pendulum/IIP/MultipleShooting(100, Tsit5; grid_coarsening = true) 0.0668 ± 0.0022 s 0.0661 ± 0.0022 s 1.01 ± 0.047
Simple Pendulum/IIP/Shooting(Tsit5()) 0.182 ± 0.084 ms 0.181 ± 0.085 ms 1 ± 0.66
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK2() 0.816 ± 0.0031 s 0.87 ± 0.0078 s 0.939 ± 0.0091
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK3() 22 ± 6.7 ms 18.1 ± 7.4 ms 1.22 ± 0.62
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK4() 3.54 ± 0.22 ms 3.52 ± 0.16 ms 1.01 ± 0.078
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK5() 4.23 ± 0.23 ms 4.26 ± 0.27 ms 0.993 ± 0.082
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK6() 1.99 ± 0.13 ms 1.99 ± 0.14 ms 1 ± 0.097
Simple Pendulum/OOP/MultipleShooting(10, Tsit5; grid_coarsening = false) 3.72 ± 0.68 ms 3.83 ± 0.85 ms 0.97 ± 0.28
Simple Pendulum/OOP/MultipleShooting(10, Tsit5; grid_coarsening = true) 7.08 ± 6.6 ms 7.12 ± 6.7 ms 0.995 ± 1.3
Simple Pendulum/OOP/MultipleShooting(100, Tsit5; grid_coarsening = false) 0.106 ± 0.0045 s 0.101 ± 0.0047 s 1.06 ± 0.066
Simple Pendulum/OOP/MultipleShooting(100, Tsit5; grid_coarsening = true) 0.162 ± 0.011 s 0.155 ± 0.007 s 1.05 ± 0.086
Simple Pendulum/OOP/Shooting(Tsit5()) 0.642 ± 0.047 ms 0.648 ± 0.079 ms 0.991 ± 0.14
time_to_load 7.02 ± 0.082 s 6.97 ± 0.048 s 1.01 ± 0.014
Memory benchmarks
master 5e3bc7c... master / 5e3bc7c...
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK2() 0.389 M allocs: 0.0442 GB 0.389 M allocs: 0.0442 GB 1
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK3() 0.0436 M allocs: 4.87 MB 0.0436 M allocs: 4.87 MB 1
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK4() 15.7 k allocs: 1.64 MB 15.7 k allocs: 1.64 MB 1
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK5() 22.1 k allocs: 2.03 MB 22.1 k allocs: 2.03 MB 1
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK6() 12.8 k allocs: 1.04 MB 12.8 k allocs: 1.04 MB 1
Simple Pendulum/IIP/MultipleShooting(10, Tsit5; grid_coarsening = false) 25.5 k allocs: 1.82 MB 25.5 k allocs: 1.82 MB 1
Simple Pendulum/IIP/MultipleShooting(10, Tsit5; grid_coarsening = true) 0.0491 M allocs: 3.39 MB 0.0491 M allocs: 3.39 MB 1
Simple Pendulum/IIP/MultipleShooting(100, Tsit5; grid_coarsening = false) 0.553 M allocs: 0.0535 GB 0.553 M allocs: 0.0535 GB 1
Simple Pendulum/IIP/MultipleShooting(100, Tsit5; grid_coarsening = true) 0.833 M allocs: 0.0778 GB 0.833 M allocs: 0.0778 GB 1
Simple Pendulum/IIP/Shooting(Tsit5()) 4.63 k allocs: 0.224 MB 4.63 k allocs: 0.224 MB 1
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK2() 0.89 M allocs: 0.984 GB 0.89 M allocs: 0.984 GB 1
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK3() 0.0932 M allocs: 24.8 MB 0.0932 M allocs: 24.8 MB 1
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK4() 0.0324 M allocs: 3.95 MB 0.0324 M allocs: 3.95 MB 1
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK5() 0.045 M allocs: 4.98 MB 0.045 M allocs: 4.98 MB 1
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK6() 25.3 k allocs: 2.17 MB 25.3 k allocs: 2.17 MB 1
Simple Pendulum/OOP/MultipleShooting(10, Tsit5; grid_coarsening = false) 0.142 M allocs: 10.2 MB 0.142 M allocs: 10.2 MB 1
Simple Pendulum/OOP/MultipleShooting(10, Tsit5; grid_coarsening = true) 0.266 M allocs: 18.7 MB 0.266 M allocs: 18.7 MB 1
Simple Pendulum/OOP/MultipleShooting(100, Tsit5; grid_coarsening = false) 2.52 M allocs: 0.279 GB 2.52 M allocs: 0.279 GB 1
Simple Pendulum/OOP/MultipleShooting(100, Tsit5; grid_coarsening = true) 3.82 M allocs: 0.404 GB 3.82 M allocs: 0.404 GB 1
Simple Pendulum/OOP/Shooting(Tsit5()) 0.0373 M allocs: 1.69 MB 0.0373 M allocs: 1.69 MB 1
time_to_load 0.159 k allocs: 11.2 kB 0.159 k allocs: 11.2 kB 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant