Extend FFTBasedPoissonSolver to work on AMDGPU#4593
Conversation
|
|
||
| function plan_backward_transform(A::ROCArray, ::Union{Bounded, Periodic}, dims, planner_flag) | ||
| length(dims) == 0 && return nothing | ||
| return AMDGPU.rocFFT.plan_bfft!(A, dims) |
There was a problem hiding this comment.
this is the plan_bfft!, not the plan_ifft!
do we need to add a normalisation or something?
|
With the current modifications I can create an FFTBasedPoissonSolver with a uniformly-spaced grid. julia> using Oceananigans, AMDGPU
julia> grid = RectilinearGrid(GPU(AMDGPU.ROCBackend()), size=(8, 8, 8), extent=(1, 2, 3))
8×8×8 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on ROCGPU with 3×3×3 halo
├── Periodic x ∈ [0.0, 1.0) regularly spaced with Δx=0.125
├── Periodic y ∈ [0.0, 2.0) regularly spaced with Δy=0.25
└── Bounded z ∈ [-3.0, 0.0] regularly spaced with Δz=0.375
julia> pressure_solver = Oceananigans.Solvers.FFTBasedPoissonSolver(grid)
FFTBasedPoissonSolver on GPU{ROCBackend}:
├── grid: 8×8×8 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on ROCGPU with 3×3×3 halo
├── storage: ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}
├── buffer: ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}
└── transforms:
├── forward: Oceananigans.Solvers.DiscreteTransform, Oceananigans.Solvers.DiscreteTransform
└── backward: Oceananigans.Solvers.DiscreteTransform, Oceananigans.Solvers.DiscreteTransform
julia> model = NonhydrostaticModel(; grid, pressure_solver)
NonhydrostaticModel{GPU, RectilinearGrid}(time = 0 seconds, iteration = 0)
├── grid: 8×8×8 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on ROCGPU with 3×3×3 halo
├── timestepper: RungeKutta3TimeStepper
├── advection scheme: Centered(order=2)
├── tracers: ()
├── closure: Nothing
├── buoyancy: Nothing
└── coriolis: Nothing
julia> simulation = Simulation(model, Δt=10, stop_iteration=3)
Simulation of NonhydrostaticModel{GPU, RectilinearGrid}(time = 0 seconds, iteration = 0)
├── Next time step: 10 seconds
├── Elapsed wall time: 0 seconds
├── Wall time per iteration: NaN days
├── Stop time: Inf days
├── Stop iteration: 3.0
├── Wall time limit: Inf
├── Minimum relative step: 0.0
├── Callbacks: OrderedDict with 4 entries:
│ ├── stop_time_exceeded => Callback of stop_time_exceeded on IterationInterval(1)
│ ├── stop_iteration_exceeded => Callback of stop_iteration_exceeded on IterationInterval(1)
│ ├── wall_time_limit_exceeded => Callback of wall_time_limit_exceeded on IterationInterval(1)
│ └── nan_checker => Callback of NaNChecker for u on IterationInterval(100)
├── Output writers: OrderedDict with no entries
└── Diagnostics: OrderedDict with no entries
julia> run!(simulation)
[ Info: Initializing simulation...
[ Info: ... simulation initialization complete (416.478 μs)
[ Info: Executing initial time step...
[ Info: ... initial time step complete (3.048 ms).
[ Info: Simulation is stopping after running for 6.118 ms.
[ Info: Model iteration 3 equals or exceeds stop iteration 3. |
|
With AMDGPU, scalar operations need |
|
We have a generic |
|
This is ready to review. |
|
@michel2323 , I feel that with AMDGPU starting to get some use, we should try to merge #4499. What do you think? If we wait too long we may get some conflicts. |
Can you please test this? An empirical test should reveal whether this code is correct and also will show what the normalization needs to be (if documentation is insufficient, which I am not surprised that it is). |
@glwagner I continuously rebased, so it shouldn't be too far off. However, while the majority of tests pass, there are still a few that fail. I'll do another pass tomorrow morning and give you a short summary. I might need some help. |
Yes, I wanted an idea of how to test. I know that julia> Nx, Ny = 3, 4
(3, 4)
julia> a = rand(Nx, Ny)
3×4 Matrix{Float64}:
0.453795 0.778266 0.499822 0.749407
0.646126 0.855627 0.00391171 0.158878
0.307381 0.898621 0.453264 0.604466
julia> ah = fft(a)
3×4 Matrix{ComplexF64}:
6.40956+0.0im 0.450305-1.01976im -1.68096+0.0im 0.450305+1.01976im
0.517153+0.518913im -0.64285-0.21592im -0.0206029-0.327336im 0.0544652-1.14911im
0.517153-0.518913im 0.0544652+1.14911im -0.0206029+0.327336im -0.64285+0.21592im
julia> ifft(ah)
3×4 Matrix{ComplexF64}:
0.453795+0.0im 0.778266+0.0im 0.499822+0.0im 0.749407+0.0im
0.646126+0.0im 0.855627+0.0im 0.00391171+0.0im 0.158878+0.0im
0.307381+0.0im 0.898621+0.0im 0.453264+0.0im 0.604466+0.0im
julia> bfft(ah) / (Nx * Ny)
3×4 Matrix{ComplexF64}:
0.453795+0.0im 0.778266+0.0im 0.499822+0.0im 0.749407+0.0im
0.646126+0.0im 0.855627+0.0im 0.00391171+0.0im 0.158878+0.0im
0.307381+0.0im 0.898621+0.0im 0.453264+0.0im 0.604466+0.0imBut I don't know exactly how to incorporate this function in the plans so that it's the same as elsewhere in the pressure solver. |
|
Hm... If I understand correctly we need to manually apply the scaling after the inverse transform. How do I do that best way? The normalization should be N = prod(size(A, d) for d in dims)
function backward_transform!(x)
p * x # in-place inverse FFT
@. x = x / N # in-place normalization
return x
end |
|
What prevents you from using that exact code? |
|
I tried and failed. I need help perhaps? |
|
I tried doing this in the AMD extension: function plan_backward_transform(A::ROCArray, ::Union{Bounded, Periodic}, dims, planner_flag)
length(dims) == 0 && return nothing
p = AMDGPU.rocFFT.plan_bfft!(A, dims)
N = prod(size(A, d) for d in dims)
function backward_transform!(x, y)
@. x = p * y # in-place inverse FFT
@. x = x / N # in-place normalization
return x
end
return backward_transform!
endbut I got an error related with binary operations... The pressure_solver is constructed but while time stepping I get: julia> using Oceananigans, AMDGPU
[ Info: Precompiling OceananigansAMDGPUExt [8da284e0-aa90-5603-9506-0dba59ddb112]
julia> grid = RectilinearGrid(GPU(AMDGPU.ROCBackend()), size=(16, 8, 4), x=(0, 16), y=(0, 1), z=[0, 1, 2, 3, 4])
16×8×4 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on ROCGPU with 3×3×3 halo
├── Periodic x ∈ [0.0, 16.0) regularly spaced with Δx=1.0
├── Periodic y ∈ [0.0, 1.0) regularly spaced with Δy=0.125
└── Bounded z ∈ [0.0, 4.0] variably spaced with min(Δz)=1.0, max(Δz)=1.0
julia> pressure_solver = Oceananigans.Solvers.FFTBasedPoissonSolver(grid)
FFTBasedPoissonSolver on GPU{ROCBackend}:
├── grid: 16×8×4 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on ROCGPU with 3×3×3 halo
├── storage: ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}
├── buffer: ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}
└── transforms:
├── forward: Oceananigans.Solvers.DiscreteTransform
└── backward: Oceananigans.Solvers.DiscreteTransform
julia> model = NonhydrostaticModel(; grid, pressure_solver)
NonhydrostaticModel{GPU, RectilinearGrid}(time = 0 seconds, iteration = 0)
├── grid: 16×8×4 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on ROCGPU with 3×3×3 halo
├── timestepper: RungeKutta3TimeStepper
├── advection scheme: Centered(order=2)
├── tracers: ()
├── closure: Nothing
├── buoyancy: Nothing
└── coriolis: Nothing
julia> simulation = Simulation(model, Δt=10, stop_iteration=3)
Simulation of NonhydrostaticModel{GPU, RectilinearGrid}(time = 0 seconds, iteration = 0)
├── Next time step: 10 seconds
├── Elapsed wall time: 0 seconds
├── Wall time per iteration: NaN days
├── Stop time: Inf days
├── Stop iteration: 3.0
├── Wall time limit: Inf
├── Minimum relative step: 0.0
├── Callbacks: OrderedDict with 4 entries:
│ ├── stop_time_exceeded => Callback of stop_time_exceeded on IterationInterval(1)
│ ├── stop_iteration_exceeded => Callback of stop_iteration_exceeded on IterationInterval(1)
│ ├── wall_time_limit_exceeded => Callback of wall_time_limit_exceeded on IterationInterval(1)
│ └── nan_checker => Callback of NaNChecker for u on IterationInterval(100)
├── Output writers: OrderedDict with no entries
└── Diagnostics: OrderedDict with no entries
julia> run!(simulation)
[ Info: Initializing simulation...
[ Info: ... simulation initialization complete (4.818 seconds)
[ Info: Executing initial time step...
ERROR: MethodError: no method matching *(::OceananigansAMDGPUExt.var"#backward_transform!#3"{…}, ::ROCArray{…})
Closest candidates are:
*(::Any, ::Any, ::Any, ::Oceananigans.Grids.AbstractGrid, ::Any, ::Any, ::Number, ::Oceananigans.AbstractOperations.BinaryOperation)
@ Oceananigans ~/Oceananigans.jl/src/AbstractOperations/binary_operations.jl:91
*(::Any, ::Any, ::Any, ::Oceananigans.Grids.AbstractGrid, ::Any, ::Any, ::Oceananigans.AbstractOperations.BinaryOperation, ::Oceananigans.AbstractOperations.BinaryOperation)
@ Oceananigans ~/Oceananigans.jl/src/AbstractOperations/binary_operations.jl:70
*(::Any, ::Any, ::Any, ::Oceananigans.Grids.AbstractGrid, ::Any, ::Any, ::Oceananigans.AbstractOperations.BinaryOperation, ::Oceananigans.Fields.AbstractField)
@ Oceananigans ~/Oceananigans.jl/src/AbstractOperations/binary_operations.jl:76
...
Stacktrace:
[1] apply_transform!(A::ROCArray{…}, B::ROCArray{…}, plan::Function, ::Nothing)
@ Oceananigans.Solvers ~/Oceananigans.jl/src/Solvers/discrete_transforms.jl:142
[2] (::Oceananigans.Solvers.DiscreteTransform{…})(A::ROCArray{…}, buffer::ROCArray{…})
@ Oceananigans.Solvers ~/Oceananigans.jl/src/Solvers/discrete_transforms.jl:118
[3] solve!(ϕ::Field{…}, solver::Oceananigans.Solvers.FFTBasedPoissonSolver{…}, b::ROCArray{…}, m::Int64)
@ Oceananigans.Solvers ~/Oceananigans.jl/src/Solvers/fft_based_poisson_solver.jl:119
[4] solve!(ϕ::Field{…}, solver::Oceananigans.Solvers.FFTBasedPoissonSolver{…})
@ Oceananigans.Solvers ~/Oceananigans.jl/src/Solvers/fft_based_poisson_solver.jl:96
[5] solve_for_pressure!(pressure::Field{…}, solver::Oceananigans.Solvers.FFTBasedPoissonSolver{…}, Δt::Float64, args::@NamedTuple{…})
@ Oceananigans.Models.NonhydrostaticModels ~/Oceananigans.jl/src/Models/NonhydrostaticModels/solve_for_pressure.jl:85
[6] compute_pressure_correction!(model::NonhydrostaticModel{…}, Δt::Float64)
@ Oceananigans.Models.NonhydrostaticModels ~/Oceananigans.jl/src/Models/NonhydrostaticModels/pressure_correction.jl:13
[7] time_step!(model::NonhydrostaticModel{…}, Δt::Float64; callbacks::Tuple{})
@ Oceananigans.TimeSteppers ~/Oceananigans.jl/src/TimeSteppers/runge_kutta_3.jl:114
[8] time_step!(sim::Simulation{…})
@ Oceananigans.Simulations ~/Oceananigans.jl/src/Simulations/run.jl:149
[9] run!(sim::Simulation{…}; pickup::Bool)
@ Oceananigans.Simulations ~/Oceananigans.jl/src/Simulations/run.jl:105
[10] run!(sim::Simulation{…})
@ Oceananigans.Simulations ~/Oceananigans.jl/src/Simulations/run.jl:92
[11] top-level scope
@ REPL[6]:1
Some type information was truncated. Use `show(err)` to see complete types. |
|
The stack trace shows that the error originates from here: Oceananigans.jl/src/Solvers/discrete_transforms.jl Lines 141 to 144 in 3489204 You need to extend |
|
although the |
|
Oceananigans.jl/src/Solvers/discrete_transforms.jl Lines 8 to 17 in 3489204 |
|
Yes. I am confused though because the Fourier transforms don't include a normalisation factor but then it seems that the DiscreteTransforms method were called. (Is the Fourier transforms <: DiscreteTransforms)? Perhaps I have a lack of understanding of how things work in general with the transforms and that's where the essence of my difficulty lies... :( |
|
Hm... I seem to be getting an error that plan_ifft! is not defined? |
I just came across this. The FFT stuff in the AMDGPU extension is simply broken: NumericalEarth/Breeze.jl#369 (comment). Oceananigans shouldn't call |
|
Makes sense, so we don’t have to figure out how to do this translation from ifft to bfft ourselves |
|
New errors, but the undefined |
|
Remaining errors are the method error with |
|
Bump 🙂 |
| for grid in (regular_grid, vertically_stretched_grid) | ||
| cg_kw = (maxiter=10, reltol=1e-7, abstol=1e-7, preconditioner=nothing) | ||
| pressure_solvers = (Oceananigans.Solvers.ConjugateGradientPoissonSolver(grid; cg_kw...), | ||
| Oceananigans.Solvers.FFTBasedPoissonSolver(grid)) |
There was a problem hiding this comment.
this is not a valid solver for the vertically stretched grid and is why we are getting an error
|
AMD GPUs tests are now passing 🥳 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4593 +/- ##
===========================================
+ Coverage 33.33% 73.04% +39.71%
===========================================
Files 387 395 +8
Lines 21506 22011 +505
===========================================
+ Hits 7170 16079 +8909
+ Misses 14336 5932 -8404
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This PR adds FFTBasedPoisonSolve method for AMDGPU.
Closes #4591