Change output format + fix bugs and parallelize by bestouff · Pull Request #5 · igarciad/weather_simulation

bestouff · 2026-04-05T18:07:39Z

The simulation never actually computed anything: all 4 step functions (simulateSTEP1-4) took Grid3D and Ground3D parameters by value instead of by reference. Every call deep-copied the grids, computed tendencies into the copies, and discarded them on return. The output was always just the initial sounding data with zero gridRslow, regardless of how many timesteps were run.

Bug fixes:

Change all step function signatures to take Grid3D&/Ground3D& by reference, with std::ref() wrappers in std::thread constructors
Fix W vertical pressure gradient parenthesization: the closing paren of the theta average was misplaced, causing dPi to only multiply the k-1 term instead of the full (k + k-1) average. This produced a ~994 m/s/s unbalanced tendency that caused NaN at step 37 (previously masked by the pass-by-value bug silently discarding all results)
Enable multithreading: restore hardware_concurrency() instead of unconditional numThreads=1, fix VLA to std::vectorstd::thread
Add #include for std::ref

Performance optimizations:

Add sq() inline helper, replace ~40 pow(x, 2.0f) calls with sq(x)
Precompute reciprocals: inv_gridSizeI/J, Kx/Ky/Kz_over_gridSize*
Horner's method for gamma polynomial in STEP3 (was 5 pow() calls)
Build with -O2, remove *.h from g++ command line

Result: master takes 49s to compute nothing; this version takes 1s to run the actual physics simulation (16 threads, -O2).

EvoPulseGaming · 2026-04-05T18:15:39Z

Really cool to see this. I remember naively trying to get this ported to unreal engine many years ago.

Parallelize all 4 simulation steps using std::thread, splitting work by k-slabs (STEP1-3) or j-rows (STEP4). Each step spawns hardware_concurrency() threads, with std::ref() wrappers to pass Grid3D/Ground3D by reference through std::thread constructors. Scalar optimizations: - Add sq() inline helper, replace ~20 powf(x,2) with sq(x) or x*x - Precompute reciprocals: inv_gridSizeI/J, Kx/Ky/Kz_over_gridSize*, cmax2 — hoisted out of inner loops - Horner's method for gamma polynomial in STEP4 (was 5 powf() calls) - Build with -O2, remove *.h from g++ command line Verified: output matches upstream/master to within floating-point noise (last-digit differences from powf->multiply substitution). Tested with 1000 timesteps, no NaN. Timing: upstream/master 46s -> this version 2s wall-clock (16 threads).

bestouff changed the title ~~Chang eoutput format + fix bugs and parallelize~~ Change output format + fix bugs and parallelize Apr 5, 2026

bestouff force-pushed the full3d branch from b5a3167 to bfa7463 Compare April 5, 2026 18:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change output format + fix bugs and parallelize#5

Change output format + fix bugs and parallelize#5
bestouff wants to merge 1 commit intoigarciad:masterfrom
bestouff:full3d

bestouff commented Apr 5, 2026 •

edited

Loading

Uh oh!

EvoPulseGaming commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bestouff commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EvoPulseGaming commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bestouff commented Apr 5, 2026 •

edited

Loading