Skip to content

Apply GPU optimizations to TLSPH#1139

Draft
efaulhaber wants to merge 19 commits intotrixi-framework:mainfrom
efaulhaber:tlsph-gpu-performance
Draft

Apply GPU optimizations to TLSPH#1139
efaulhaber wants to merge 19 commits intotrixi-framework:mainfrom
efaulhaber:tlsph-gpu-performance

Conversation

@efaulhaber
Copy link
Copy Markdown
Member

@efaulhaber efaulhaber commented Apr 13, 2026

Based on trixi-framework/PointNeighbors.jl#154. Tests will pass once PointNeighbors 0.6.6 is released.

@efaulhaber efaulhaber force-pushed the tlsph-gpu-performance branch from 006d4f5 to 7518a8c Compare April 14, 2026 10:26
@efaulhaber efaulhaber marked this pull request as ready for review April 14, 2026 15:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors Total Lagrangian SPH (TLSPH) neighbor interactions to better match GPU-friendly execution patterns (per-particle threading, reduced memory traffic, fewer repeated loads), following the newer PointNeighbors neighbor-iteration approach.

Changes:

  • Refactor TLSPH deformation gradient and RHS assembly to use per-particle @threaded loops with foreach_neighbor, accumulating into Refs to reduce global writes.
  • Optimize penalty force and viscosity kernels for GPU performance (preload deformation gradients, use div_fast, and use smoothing_kernel_unsafe after cutoff filtering).
  • Add a SIMD-based fast path for extracting 2×2 matrices (extract_smatrix) and wire in the SIMD dependency; update tests’ mock systems accordingly.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/systems/tlsph_system.jl Updates TLSPH test mock system to a concrete struct (GPU-friendly) and adjusts numeric literals.
test/schemes/structure/total_lagrangian_sph/rhs.jl Updates RHS tests’ mock system layout and adds deformation_gradient stub required by new RHS path.
src/schemes/structure/total_lagrangian_sph/viscosity.jl Threads deformation gradient through viscosity path and applies div_fast in hot divisions.
src/schemes/structure/total_lagrangian_sph/system.jl Reworks deformation gradient assembly into per-particle neighbor loops with reduced memory writes.
src/schemes/structure/total_lagrangian_sph/rhs.jl Reworks RHS assembly similarly; passes deformation gradients into penalty/viscosity for fewer loads.
src/schemes/structure/total_lagrangian_sph/penalty_force.jl Converts penalty force to an in-place accumulator API and switches to unsafe kernel + fast divisions.
src/schemes/boundary/wall_boundary/system.jl Adds smoothing_kernel_unsafe specialization for wall boundary systems.
src/general/neighborhood_search.jl Adds foreach_neighbor wrapper around PointNeighbors neighbor iteration.
src/general/abstract_system.jl Adds extract_smatrix Val-specialization and a SIMD 2D fast path.
src/TrixiParticles.jl Imports SIMD module for use in extract_smatrix.
Project.toml Adds SIMD dependency + compat; minor reordering of weakdeps/extensions entries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/general/abstract_system.jl Outdated
Comment thread src/schemes/structure/total_lagrangian_sph/viscosity.jl Outdated
Comment on lines +70 to +71
# The compiler is smart enough to optimize this away if no penalty force is used
F_b = @inbounds deformation_gradient(system, neighbor)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be placed here in order to be optimized by the compiler?

Suggested change
# The compiler is smart enough to optimize this away if no penalty force is used
F_b = @inbounds deformation_gradient(system, neighbor)
# F_b is used here only for the penalty force. The compiler is smart enough to optimize this away if no penalty force is used.
F_b = @inbounds deformation_gradient(system, neighbor)

Does this have to be placed here in order to be optimized by the compiler?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, yes. Moving this into the penalty force makes it slower when penalty force is used. Having it here does not make it slower when no penalty force is used.

Comment thread src/schemes/structure/total_lagrangian_sph/system.jl
@efaulhaber efaulhaber requested a review from svchb April 14, 2026 15:42
eps_sum = (F_a + F_b) * initial_pos_diff - 2 * current_pos_diff
delta_sum = dot(eps_sum, current_pos_diff) * inv_current_distance
eps_a = F_a * initial_pos_diff - current_pos_diff
eps_b = -(F_b * initial_pos_diff - current_pos_diff)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same as the line that is replaced? Why now with a minus for eps_b?

Copy link
Copy Markdown
Member Author

@efaulhaber efaulhaber Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. The new version is the correct epsilon from the paper, but then the delta is missing a minus.
I'm surprised that the tests are not failing because of this, only the validation. Apparently, this incorrect penalty force and be countered by a small time step. I added another oscillating beam example test with more penalty force and a tightly tuned CFL as a regression test, which correctly catches this error.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 15, 2026

Codecov Report

❌ Patch coverage is 74.71264% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.26%. Comparing base (379db88) to head (d5766f0).

Files with missing lines Patch % Lines
...es/structure/total_lagrangian_sph/penalty_force.jl 14.28% 12 Missing ⚠️
...chemes/structure/total_lagrangian_sph/viscosity.jl 16.66% 5 Missing ⚠️
src/schemes/boundary/wall_boundary/system.jl 0.00% 3 Missing ⚠️
src/general/abstract_system.jl 84.61% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (379db88) and HEAD (d5766f0). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (379db88) HEAD (d5766f0)
total 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1139       +/-   ##
===========================================
- Coverage   89.06%   67.26%   -21.81%     
===========================================
  Files         128      128               
  Lines        9868     9857       -11     
===========================================
- Hits         8789     6630     -2159     
- Misses       1079     3227     +2148     
Flag Coverage Δ
total ?
unit 67.26% <74.71%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@efaulhaber
Copy link
Copy Markdown
Member Author

/run-gpu-tests

@efaulhaber
Copy link
Copy Markdown
Member Author

/run-gpu-tests

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/schemes/structure/total_lagrangian_sph/system.jl Outdated
Comment thread src/general/neighborhood_search.jl
Comment thread src/schemes/structure/total_lagrangian_sph/system.jl Outdated
@efaulhaber efaulhaber marked this pull request as draft April 15, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants