Apply GPU optimizations to TLSPH by efaulhaber · Pull Request #1139 · trixi-framework/TrixiParticles.jl

efaulhaber · 2026-04-13T10:51:53Z

Based on trixi-framework/PointNeighbors.jl#154. Tests will pass once PointNeighbors 0.6.6 is released.

Copilot

Pull request overview

This PR refactors Total Lagrangian SPH (TLSPH) neighbor interactions to better match GPU-friendly execution patterns (per-particle threading, reduced memory traffic, fewer repeated loads), following the newer PointNeighbors neighbor-iteration approach.

Changes:

Refactor TLSPH deformation gradient and RHS assembly to use per-particle @threaded loops with foreach_neighbor, accumulating into Refs to reduce global writes.
Optimize penalty force and viscosity kernels for GPU performance (preload deformation gradients, use div_fast, and use smoothing_kernel_unsafe after cutoff filtering).
Add a SIMD-based fast path for extracting 2×2 matrices (extract_smatrix) and wire in the SIMD dependency; update tests’ mock systems accordingly.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
test/systems/tlsph_system.jl	Updates TLSPH test mock system to a concrete struct (GPU-friendly) and adjusts numeric literals.
test/schemes/structure/total_lagrangian_sph/rhs.jl	Updates RHS tests’ mock system layout and adds `deformation_gradient` stub required by new RHS path.
src/schemes/structure/total_lagrangian_sph/viscosity.jl	Threads deformation gradient through viscosity path and applies `div_fast` in hot divisions.
src/schemes/structure/total_lagrangian_sph/system.jl	Reworks deformation gradient assembly into per-particle neighbor loops with reduced memory writes.
src/schemes/structure/total_lagrangian_sph/rhs.jl	Reworks RHS assembly similarly; passes deformation gradients into penalty/viscosity for fewer loads.
src/schemes/structure/total_lagrangian_sph/penalty_force.jl	Converts penalty force to an in-place accumulator API and switches to unsafe kernel + fast divisions.
src/schemes/boundary/wall_boundary/system.jl	Adds `smoothing_kernel_unsafe` specialization for wall boundary systems.
src/general/neighborhood_search.jl	Adds `foreach_neighbor` wrapper around PointNeighbors neighbor iteration.
src/general/abstract_system.jl	Adds `extract_smatrix` Val-specialization and a SIMD 2D fast path.
src/TrixiParticles.jl	Imports SIMD module for use in `extract_smatrix`.
Project.toml	Adds SIMD dependency + compat; minor reordering of weakdeps/extensions entries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

LasNikas · 2026-04-14T15:33:41Z

+            # The compiler is smart enough to optimize this away if no penalty force is used
+            F_b = @inbounds deformation_gradient(system, neighbor)


Does this have to be placed here in order to be optimized by the compiler?

Suggested change

# The compiler is smart enough to optimize this away if no penalty force is used

F_b = @inbounds deformation_gradient(system, neighbor)

# F_b is used here only for the penalty force. The compiler is smart enough to optimize this away if no penalty force is used.

F_b = @inbounds deformation_gradient(system, neighbor)

Does this have to be placed here in order to be optimized by the compiler?

Unfortunately, yes. Moving this into the penalty force makes it slower when penalty force is used. Having it here does not make it slower when no penalty force is used.

svchb · 2026-04-15T09:03:44Z

-    eps_sum = (F_a + F_b) * initial_pos_diff - 2 * current_pos_diff
-    delta_sum = dot(eps_sum, current_pos_diff) * inv_current_distance
+    eps_a = F_a * initial_pos_diff - current_pos_diff
+    eps_b = -(F_b * initial_pos_diff - current_pos_diff)


This is not the same as the line that is replaced? Why now with a minus for eps_b?

You're right. The new version is the correct epsilon from the paper, but then the delta is missing a minus.
I'm surprised that the tests are not failing because of this, only the validation. Apparently, this incorrect penalty force and be countered by a small time step. I added another oscillating beam example test with more penalty force and a tightly tuned CFL as a regression test, which correctly catches this error.

codecov · 2026-04-15T11:06:22Z

Codecov Report

❌ Patch coverage is 74.71264% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.26%. Comparing base (379db88) to head (d5766f0).

Files with missing lines	Patch %	Lines
...es/structure/total_lagrangian_sph/penalty_force.jl	14.28%	12 Missing ⚠️
...chemes/structure/total_lagrangian_sph/viscosity.jl	16.66%	5 Missing ⚠️
src/schemes/boundary/wall_boundary/system.jl	0.00%	3 Missing ⚠️
src/general/abstract_system.jl	84.61%	2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (379db88) and HEAD (d5766f0). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (379db88) HEAD (d5766f0)

total 1 0

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1139       +/-   ##
===========================================
- Coverage   89.06%   67.26%   -21.81%     
===========================================
  Files         128      128               
  Lines        9868     9857       -11     
===========================================
- Hits         8789     6630     -2159     
- Misses       1079     3227     +2148

Flag	Coverage Δ
total	`?`
unit	`67.26% <74.71%> (+0.08%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

efaulhaber · 2026-04-15T14:52:35Z

/run-gpu-tests

efaulhaber · 2026-04-15T15:02:46Z

/run-gpu-tests

Copilot

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…rticles.jl into tlsph-gpu-performance

efaulhaber self-assigned this Apr 13, 2026

efaulhaber added performance gpu labels Apr 13, 2026

efaulhaber mentioned this pull request Apr 13, 2026

3x Speedup on GPUs: Checklist #1131

Open

7 tasks

efaulhaber force-pushed the tlsph-gpu-performance branch from 69f3560 to 006d4f5 Compare April 13, 2026 15:17

efaulhaber added 3 commits April 14, 2026 12:23

Improve performance of TLSPH RHS

b9b9395

Optimize deformation gradient

bcdc27d

Use new foreach_neighbor_unsafe

7518a8c

efaulhaber force-pushed the tlsph-gpu-performance branch from 006d4f5 to 7518a8c Compare April 14, 2026 10:26

efaulhaber added 3 commits April 14, 2026 12:28

Remove PR dependencies

d6e5e8f

Fix

30e5260

Add comments to extract_smatrix

ac173e1

efaulhaber marked this pull request as ready for review April 14, 2026 15:16

efaulhaber added 2 commits April 14, 2026 17:17

Fix unit tests

32479b4

Reformat

ec0de67

efaulhaber requested a review from Copilot April 14, 2026 15:19

Copilot started reviewing on behalf of efaulhaber April 14, 2026 15:19 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Comment thread src/general/abstract_system.jl Outdated

Comment thread src/schemes/structure/total_lagrangian_sph/viscosity.jl Outdated

Fix

ed3d88c

LasNikas reviewed Apr 14, 2026

View reviewed changes

Merge branch 'main' into tlsph-gpu-performance

f18eaf6

efaulhaber requested a review from svchb April 14, 2026 15:42

svchb requested changes Apr 15, 2026

View reviewed changes

efaulhaber added 5 commits April 15, 2026 11:56

Fix penalty force and add regression test

d4b97d1

Fix allocations

e431d8b

Add warning and separate aligned function for the vloada extract_smatrix

f7215bb

Fix unit tests

088df40

Reformat

6908456

Fix deformation gradient

706149d

Add GPU tests

d5766f0

efaulhaber requested review from LasNikas, Copilot and svchb April 15, 2026 16:10

Copilot started reviewing on behalf of efaulhaber April 15, 2026 16:11 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Comment thread src/schemes/structure/total_lagrangian_sph/system.jl Outdated

Comment thread src/general/neighborhood_search.jl

Comment thread src/schemes/structure/total_lagrangian_sph/system.jl Outdated

efaulhaber added 2 commits April 15, 2026 18:18

Fix aligned extract_smatrix calls

80d7424

Merge branch 'tlsph-gpu-performance' of github.com:efaulhaber/TrixiPa…

92ed24e

…rticles.jl into tlsph-gpu-performance

efaulhaber marked this pull request as draft April 15, 2026 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply GPU optimizations to TLSPH#1139

Apply GPU optimizations to TLSPH#1139
efaulhaber wants to merge 19 commits intotrixi-framework:mainfrom
efaulhaber:tlsph-gpu-performance

efaulhaber commented Apr 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

LasNikas Apr 14, 2026

Uh oh!

efaulhaber Apr 14, 2026

Uh oh!

Uh oh!

svchb Apr 15, 2026

Uh oh!

efaulhaber Apr 15, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

efaulhaber commented Apr 15, 2026

Uh oh!

efaulhaber commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		# The compiler is smart enough to optimize this away if no penalty force is used
		F_b = @inbounds deformation_gradient(system, neighbor)

Conversation

efaulhaber commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

LasNikas Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

efaulhaber Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

svchb Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

efaulhaber Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

efaulhaber commented Apr 15, 2026

Uh oh!

efaulhaber commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

efaulhaber commented Apr 13, 2026 •

edited

Loading

efaulhaber Apr 15, 2026 •

edited

Loading

codecov bot commented Apr 15, 2026 •

edited

Loading