Change output format + fix bugs and parallelize#5
Open
bestouff wants to merge 1 commit intoigarciad:masterfrom
Open
Change output format + fix bugs and parallelize#5bestouff wants to merge 1 commit intoigarciad:masterfrom
bestouff wants to merge 1 commit intoigarciad:masterfrom
Conversation
|
Really cool to see this. I remember naively trying to get this ported to unreal engine many years ago. |
Parallelize all 4 simulation steps using std::thread, splitting work by k-slabs (STEP1-3) or j-rows (STEP4). Each step spawns hardware_concurrency() threads, with std::ref() wrappers to pass Grid3D/Ground3D by reference through std::thread constructors. Scalar optimizations: - Add sq() inline helper, replace ~20 powf(x,2) with sq(x) or x*x - Precompute reciprocals: inv_gridSizeI/J, Kx/Ky/Kz_over_gridSize*, cmax2 — hoisted out of inner loops - Horner's method for gamma polynomial in STEP4 (was 5 powf() calls) - Build with -O2, remove *.h from g++ command line Verified: output matches upstream/master to within floating-point noise (last-digit differences from powf->multiply substitution). Tested with 1000 timesteps, no NaN. Timing: upstream/master 46s -> this version 2s wall-clock (16 threads).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The simulation never actually computed anything: all 4 step functions (simulateSTEP1-4) took Grid3D and Ground3D parameters by value instead of by reference. Every call deep-copied the grids, computed tendencies into the copies, and discarded them on return. The output was always just the initial sounding data with zero gridRslow, regardless of how many timesteps were run.
Bug fixes:
Performance optimizations:
Result: master takes 49s to compute nothing; this version takes 1s to run the actual physics simulation (16 threads, -O2).