Skip to content

feat: Add new scatter strategies for initial mesh redistribution#4033

Open
bd713 wants to merge 3 commits intodevelopfrom
feat/bd713/meshScatter
Open

feat: Add new scatter strategies for initial mesh redistribution#4033
bd713 wants to merge 3 commits intodevelopfrom
feat/bd713/meshScatter

Conversation

@bd713
Copy link
Copy Markdown
Contributor

@bd713 bd713 commented Apr 21, 2026

Problem

The current strategy to scatter the mesh initially loaded by rank 0 uses vtkRedistributeDataSetFilter (kd-tree partitioning). This approach can take several minutes for large meshes (10s of millions of elements) when redistributed among 1000+ ranks.

Recent example: A 136M cell mesh redistributed to 1024 ranks takes about 17 minutes with the current kd-tree approach.

Solution

This PR adds partitioning strategies that bypass the VTK library's bottleneck, available via the new scatterMethod XML attribute.

Available strategies

Method Description Comment
kdtree VTK built-in kd-tree (default) Spatial locality, but can be slow and current version can lose tetra or pyramid cells
contiguous Contiguous cell-index blocks Fast, but does not guarantee spatial locality
cartesian Regular Cartesian grid using -x/-y/-z partition counts Does not guarantee load balancing if mesh is irregular
rcb Recursive Coordinate Bisection along longest bounding-box axis Spatial locality + load balance

Remark

The resulting partitions can be further refined and improved via calls to ParMetis or PTScotch with partitionRefinement>0

Benchmark

==============================================================================================================
SUMMARY (1024 ranks, 135901500 cells)
==============================================================================================================
Method        | Time (s)  | Cells       | Status  | Min cells | Max cells | Imbalance   | BBox Overlap
--------------------------------------------------------------------------------------------------------------
KdTree        | 1020.194  | 135901494   | LOSS    | 126714    | 139764    | 1.053103    | 1.086       
Contiguous    |   15.822  | 135901500   | OK      | 132716    | 132717    | 1.000005    | 2.861       
Cartesian     |   22.005  | 135901500   | OK      | 116157    | 152266    | 1.147304    | 1.095       
RCB           |   34.289  | 135901500   | OK      | 132716    | 132717    | 1.000005    | 1.116       
--------------------------------------------------------------------------------------------------------------

Speedup relative to each method:
  KdTree        64.48x slower than fastest
  Contiguous    1.00x slower than fastest <-- fastest
  Cartesian     1.39x slower than fastest
  RCB           2.17x slower than fastest

Best load balance: Contiguous (imbalance ratio: 1.000005x)
Best compactness:  KdTree (bbox overlap: 1.086)
==============================================================================================================

Rebaseline

Rebaseline needed, all failures caused by the new scatterMethod keyword

@bd713 bd713 self-assigned this Apr 21, 2026
@bd713 bd713 added the type: feature New feature or request label Apr 21, 2026
@bd713 bd713 requested a review from victorapm April 21, 2026 03:35
@victorapm
Copy link
Copy Markdown
Contributor

That looks great, Bertrand. Even though contiguous is the fastest, we might want to have RCB as default since bbox values look much better and the additional cost is not too bad. cc @castelletto1

@bd713 bd713 added ci: run CUDA builds Allows to triggers (costly) CUDA jobs ci: run integrated tests Allows to run the integrated tests in GEOS CI labels Apr 27, 2026
@bd713 bd713 marked this pull request as ready for review April 27, 2026 17:09
@bd713 bd713 added flag: requires rebaseline Requires rebaseline branch in integratedTests ci: run code coverage enables running of the code coverage CI jobs labels Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci: run code coverage enables running of the code coverage CI jobs ci: run CUDA builds Allows to triggers (costly) CUDA jobs ci: run integrated tests Allows to run the integrated tests in GEOS CI flag: requires rebaseline Requires rebaseline branch in integratedTests type: feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants