Skip to content

Change output format + fix bugs and parallelize#5

Open
bestouff wants to merge 1 commit intoigarciad:masterfrom
bestouff:full3d
Open

Change output format + fix bugs and parallelize#5
bestouff wants to merge 1 commit intoigarciad:masterfrom
bestouff:full3d

Conversation

@bestouff
Copy link
Copy Markdown
Contributor

@bestouff bestouff commented Apr 5, 2026

The simulation never actually computed anything: all 4 step functions (simulateSTEP1-4) took Grid3D and Ground3D parameters by value instead of by reference. Every call deep-copied the grids, computed tendencies into the copies, and discarded them on return. The output was always just the initial sounding data with zero gridRslow, regardless of how many timesteps were run.

Bug fixes:

  • Change all step function signatures to take Grid3D&/Ground3D& by reference, with std::ref() wrappers in std::thread constructors
  • Fix W vertical pressure gradient parenthesization: the closing paren of the theta average was misplaced, causing dPi to only multiply the k-1 term instead of the full (k + k-1) average. This produced a ~994 m/s/s unbalanced tendency that caused NaN at step 37 (previously masked by the pass-by-value bug silently discarding all results)
  • Enable multithreading: restore hardware_concurrency() instead of unconditional numThreads=1, fix VLA to std::vectorstd::thread
  • Add #include for std::ref

Performance optimizations:

  • Add sq() inline helper, replace ~40 pow(x, 2.0f) calls with sq(x)
  • Precompute reciprocals: inv_gridSizeI/J, Kx/Ky/Kz_over_gridSize*
  • Horner's method for gamma polynomial in STEP3 (was 5 pow() calls)
  • Build with -O2, remove *.h from g++ command line

Result: master takes 49s to compute nothing; this version takes 1s to run the actual physics simulation (16 threads, -O2).

@EvoPulseGaming
Copy link
Copy Markdown

Really cool to see this. I remember naively trying to get this ported to unreal engine many years ago.

Parallelize all 4 simulation steps using std::thread, splitting work
by k-slabs (STEP1-3) or j-rows (STEP4). Each step spawns
hardware_concurrency() threads, with std::ref() wrappers to pass
Grid3D/Ground3D by reference through std::thread constructors.

Scalar optimizations:
- Add sq() inline helper, replace ~20 powf(x,2) with sq(x) or x*x
- Precompute reciprocals: inv_gridSizeI/J, Kx/Ky/Kz_over_gridSize*,
  cmax2 — hoisted out of inner loops
- Horner's method for gamma polynomial in STEP4 (was 5 powf() calls)
- Build with -O2, remove *.h from g++ command line

Verified: output matches upstream/master to within floating-point noise
(last-digit differences from powf->multiply substitution). Tested with
1000 timesteps, no NaN.

Timing: upstream/master 46s -> this version 2s wall-clock (16 threads).
@bestouff bestouff changed the title Chang eoutput format + fix bugs and parallelize Change output format + fix bugs and parallelize Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants