[WIP] [LGR] flow: parallel LGR INIT output from gathered simulator transmissibilities by arturcastiel · Pull Request #7211 · OPM/opm-simulators

arturcastiel · 2026-06-30T15:31:08Z

A parallel (MPI) flow run with LGRs currently cannot write a correct INIT file: the output path queries transmissibilities on the global refined grid, which the coarse per-rank globalTrans_ cannot answer, and the run dies. This PR makes the parallel LGR INIT output (TRANX/TRANY/TRANZ + all NNC arrays) correct, by reusing the transmissibilities the simulation itself already computed in parallel.

Approach. Each rank walks its interior leaf cells and records every connection it owns from its own (distributed) simulator transmissibilities, keyed by level-Cartesian indices — same-level connections as (level, min, max), level-crossing ones as (smaller level, its index, larger level, its index). These keys are geometrically canonical: identical on the distributed grid and the I/O rank's global view. A one-shot gatherv brings the records to the I/O rank, where the existing output walk over the global grid looks the values up directly (new helper gatherLgrOutputTrans in opm/simulators/flow/LgrOutputTransGather.hpp). Same-level rank-boundary connections arrive twice with identical values; level-crossing ones are contributed exactly once, from the smaller-level side. Nothing is recomputed anywhere; there is no global-property switch, and no whole-grid transmissibility object is stored — in parallel LGR runs the writer receives none at all. Serial runs and parallel runs without LGRs are unchanged.

Testing. New regression compareParallelInitSim_flow+SPE1CASE1_CARFIN: the existing run-parallel-regressionTest.sh driver gained a backward-compatible -m init mode (dry run, EGRID+INIT compare, serial vs 4-rank parallel), registered through a new optional COMPARE_MODE parameter of add_test_compare_parallel_simulation, on the existing SPE1CASE1_CARFIN deck from opm-tests. That deck has two 2×2×3 LGRs (12 host cells each) on a 10×10×3 grid, and the run log confirms the LGR host cells land on multiple ranks (rank 0 marks 9 of the 24 host cells during local refinement), so the compare exercises rank-boundary-straddling LGRs.

Scope. INIT output only. Summary (LW*/LC*/LB*), restart write, restart read, and RFT for parallel LGR runs are separate work.

arturcastiel · 2026-06-30T15:32:36Z

jenkins build this please

arturcastiel · 2026-07-01T06:22:28Z

jenkins build this please

arturcastiel · 2026-07-01T06:31:33Z

jenkins build this please

arturcastiel · 2026-07-01T06:57:24Z

jenkins build this please

arturcastiel · 2026-07-01T07:10:45Z

jenkins build this please

arturcastiel · 2026-07-01T07:25:39Z

@akva2 did I screw something up? I cannot get jenkins going

aritorto · 2026-07-01T07:31:49Z

+                if (simulator.vanguard().grid().comm().rank() == 0) {
+                    // Parallel LGR: the output path (computeTrans_) indexes trans on the GLOBAL refined
+                    // (equil) grid, which the coarse per-rank globalTrans_ cannot answer (out_of_range).
+                    // Decide on equilGrid().maxLevel() (the GLOBAL refinement) -- NOT grid().maxLevel(),


Just a comment, I haven't looked at all the changes deeply, only a glance.

grid.maxLevel() coincides across rank, with potentially empty level grids in those ranks that do not contain any refined cell. maxLevel is determined by the length of data_ or distributed_data_ (CpGrid::currentData()) in https://github.com/OPM/opm-grid/blob/5e213cf68d07a973fb2a9c2a6bdaea55a6f61af2/opm/grid/cpgrid/CpGrid.cpp#L767-L775

In a quick look at the tests/cpgrid/lgr, I couldn't find a proof of this, but I'm 99% sure it's like that. In any case, in simulators equilGrid hold the global view of the grid and might be needed here for other reasons.

If I find time, I'll ass a tiny test, just to double-check grid.maxLevel() behavior on a distributed grid. I keep you updated!

thank you, this is more a proof of concept. let me know if you think I am on the right direction

@aritorto new comments will be highly appreciated.

I added a few lines in an existing test: OPM/opm-grid#1046
to illustrate the behavior of grid.maxLevel() on a distributed grid. I hope this helps!

I try to take a deeper look at some point, for now, it was only about the comment on grid.maxLevel()

Nice that parallel output for LGRs is moving forward : )

@aritorto parallel lgr is moving forward just like brazil in the world cup.

Thanks a lot @aritorto — you're right, and this was really helpful.

I traced it through CpGrid::maxLevel() (just currentData().size() - 2) and
refineAndUpdateGrid: the level grids are pushed one per entry of
cells_per_dim_vec (the global LGR spec), unconditionally — a rank that owns
no refined cells still gets the (empty) level grids — and nothing prunes them
afterwards. So grid().maxLevel() really is identical on every rank, and your
opm-grid#1046 test makes that explicit. My earlier comment claiming it "would
be 0 on rank 0" was simply wrong.

I've reworded the comment: it no longer makes that claim. It now justifies
gating on equilGrid().maxLevel() as the global (undistributed) view — the
authoritative refinement depth, and the same grid the refined transmissibility
is built over. I kept equilGrid() for clarity/consistency rather than switching
to grid().maxLevel(), since they're equivalent here.

akva2 · 2026-07-01T08:11:56Z

jenkins build this please

arturcastiel · 2026-07-01T09:04:57Z

jenkins build this please

blattms

Nice start. I only took a very short look. I hope my comments help a bit.

Please note that the current code does the computation on the root rank, because globalTrans_ is used for partioning. Hence it is already there and reusing it is cheap.

For parallel LGRs, recomputing it on the root rank might not be the most efficient way as we could hopefully also do this in parallel.

Hence in the long run, we can hopefully compute the values in parallel or reuse already existing ones that the simulator has computed. Of course then we need to gather these and maybe skip some of them. That might be complicated, too.

blattms · 2026-07-01T10:05:42Z

+    // Refined transmissibility over the global (equil) refined leaf grid — built lazily on the I/O rank
+    // for parallel LGR INIT output (TRANS/NNC). Separate from globalTrans_ (kept coarse, for domain
+    // decomposition). See refinedGlobalTransmissibility(); reset by releaseGlobalTransmissibilities().
+    std::unique_ptr<TransmissibilityType> refinedGlobalTrans_;


As this is only used once, maybe we should not store it?
(globalTrans_ is there because it is computed before partitioning already, then later reused for INIT)

It might even be possible to overwrite/reset globalTrans_

Resolved by making the question moot: with the gather-based reuse there is no whole-grid refined transmissibility any more — the member, the method, and the global-props switch it needed are all gone (the vanguard headers are back to their previous state). In parallel LGR runs the writer receives no whole-grid transmissibility object at all; the coarse globalTrans_ keeps its one job (partitioning + non-LGR parallel INIT) untouched.

blattms · 2026-07-01T10:19:01Z

@@ -0,0 +1,469 @@
+#!/bin/bash


Do you really need an extra script for this? What is different from the existing one?

A data file for a test can easily be stored somewhere else or you could use one from opm-tests.

not really, fair point, this is some noise from my agents, it will be fixed

Fair, and done — the bespoke script is removed entirely. Coverage is restored with existing infrastructure instead:

run-parallel-regressionTest.sh gained a backward-compatible -m init mode (dry run, EGRID+INIT compare); the default is unchanged for every existing caller.

add_test_compare_parallel_simulation gained an optional COMPARE_MODE parameter.

The test reuses the existing SPE1CASE1_CARFIN deck from opm-tests (already used by the spe1case1_carfin smoke tests) — no new deck, no new script.

blattms · 2026-07-01T10:23:28Z

+--   +-------+-------+-------+
+--   | (1,2) | (2,2) | (3,2) |
+--   +-------+-------+-------+
+--   |(1,1)  | (2,1) | (3,1) | <- LGR1 host (INJ)


Is this setup used because larger LGRs spanning multiple ranks do not work?

In general I would prefer an LGR that gets split between ranks. That would test more things. As this is run with 4 ranks maybe use 3-4 host cells?

I can add another test to check that.

The replacement deck covers this: SPE1CASE1_CARFIN has two 2×2×3 LGRs (12 host cells each) on a 10×10×3 grid, and the test run confirms they really do get split between ranks. From the 4-rank run's log: the level-0 load balance owns 78/69/72/81 cells per rank, and during local refinement rank 0 marks 9 of the 24 LGR host cells ("9 elements have been marked (in 0 rank)" vs 24 on the global view). Since each LGR has 12 host cells, no LGR fits inside rank 0 — at least one straddles a rank boundary, which is exactly the case you wanted exercised. The serial-vs-parallel compare passes with that split.

arturcastiel · 2026-07-02T08:06:16Z

jenkins build this please

arturcastiel · 2026-07-02T11:44:18Z

@blattms Thanks — you were right, and this is now implemented rather than deferred. The branch no longer recomputes anything on the root rank: the values the simulator already computed in parallel (its own distributed transmissibilities) are reused for the INIT output. Each rank walks its interior cells and contributes its connections keyed by (level, level-Cartesian index) — a key that is identical on the distributed grid and the I/O rank's global view — and a one-shot gatherv brings them to the I/O rank, where the existing output walk just looks values up. Rank-boundary duplicates are the "skip some" you predicted: same-level ones arrive twice with identical values (benign), level-crossing ones are contributed only from the smaller-level side, exactly once.

What remains on the I/O rank is the topology walk over the global grid and the file write itself — the single-writer floor shared by all output paths (the I/O rank already holds the global grid + global field properties for every INIT static and for EGRID). Per-connection work on the root drops from compute-everything to index-and-copy-everything.

Validated with the serial-vs-parallel INIT/EGRID regression added in this PR (all TRAN* and NNC arrays compare against serial).

arturcastiel · 2026-07-02T12:52:25Z

jenkins build this please

…ties A parallel run with LGRs cannot write a correct INIT file: the output path queries transmissibilities on the global refined grid, which the coarse per-rank globalTrans_ cannot answer. Fix it by reusing the values the simulation itself already computed in parallel. Each rank walks its interior leaf cells and records every connection it owns from its own (distributed) simulator transmissibilities, keyed by level-Cartesian indices: same-level connections as (level, min, max), level-crossing ones as (smaller level, its index, larger level, its index). The keys are geometrically canonical -- identical on the distributed grid and the I/O rank's global view -- so the existing output walk (computeTrans_ / exportNncStructure_) looks the values up directly; a missing key is a hard error. The records are gathered on the I/O rank once, with a plain counts/displacements gatherv (new helper gatherLgrOutputTrans in LgrOutputTransGather.hpp). Same-level rank-boundary connections arrive from both owner ranks with identical values (either record is equally valid); level-crossing connections are contributed exactly once, from the smaller-level side. The local transmissibilities are finished before the INIT extract in this branch; finishTransmissibilities() is idempotent, so the later call is a no-op. Nothing is recomputed on the I/O rank, there is no global-property switch, and no whole-grid transmissibility object is stored -- in parallel LGR runs the writer receives none at all. Serial runs and parallel runs without LGRs are unchanged: with empty gathered maps the writer queries the whole-grid transmissibility object exactly as before.

Add a serial-vs-parallel regression for the parallel LGR INIT output using existing infrastructure: - run-parallel-regressionTest.sh gains a backward-compatible "-m <mode>" flag (default "summary" keeps the current behaviour for every existing caller; "init" does a dry run and compares EGRID+INIT only, ignoring the parallel-only MPI_RANK keyword). - add_test_compare_parallel_simulation gains an optional COMPARE_MODE parameter that forwards "-m init" and names the test compareParallelInitSim_<sim>+<case>. - The test is registered against the existing SPE1CASE1_CARFIN deck (opm-tests/lgr), which has two 12-host-cell LGRs on a 10x10x3 grid. Under the default 4-rank partition the LGR host cells land on multiple ranks (rank 0 marks 9 of the 24 host cells during local refinement), so the compare exercises rank-boundary-straddling LGRs. The registration sits before the opm_set_test_driver switch to run-comparison.sh so it picks up the run-parallel-regressionTest.sh driver that understands "-m".

arturcastiel · 2026-07-02T13:00:45Z

jenkins build this please

arturcastiel changed the title ~~flow: refined transmissibility for parallel LGR INIT output~~ [LGR] flow: refined transmissibility for parallel LGR INIT output Jun 30, 2026

arturcastiel added the manual:enhancement This is an enhancement/improvent that needs to be documented in the manual label Jun 30, 2026

arturcastiel changed the title ~~[LGR] flow: refined transmissibility for parallel LGR INIT output~~ [WIP] [LGR] flow: refined transmissibility for parallel LGR INIT output Jul 1, 2026

aritorto reviewed Jul 1, 2026

View reviewed changes

blattms reviewed Jul 1, 2026

View reviewed changes

arturcastiel changed the title ~~[WIP] [LGR] flow: refined transmissibility for parallel LGR INIT output~~ [WIP] [LGR] flow: parallel LGR INIT output from gathered simulator transmissibilities Jul 2, 2026

arturcastiel force-pushed the lgr-init-refined-trans branch from f1387a1 to 091f743 Compare July 2, 2026 12:50

arturcastiel added 2 commits July 2, 2026 12:54

arturcastiel force-pushed the lgr-init-refined-trans branch from 091f743 to 1d926f7 Compare July 2, 2026 12:58

Uh oh!

Conversation

arturcastiel commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arturcastiel commented Jun 30, 2026

Uh oh!

arturcastiel commented Jul 1, 2026

Uh oh!

arturcastiel commented Jul 1, 2026

Uh oh!

arturcastiel commented Jul 1, 2026

Uh oh!

arturcastiel commented Jul 1, 2026

Uh oh!

arturcastiel commented Jul 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akva2 commented Jul 1, 2026

Uh oh!

arturcastiel commented Jul 1, 2026

Uh oh!

blattms left a comment

Choose a reason for hiding this comment

Uh oh!

blattms Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arturcastiel commented Jul 2, 2026

Uh oh!

arturcastiel commented Jul 2, 2026

Uh oh!

arturcastiel commented Jul 2, 2026

Uh oh!

arturcastiel commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

arturcastiel commented Jun 30, 2026 •

edited

Loading

blattms Jul 1, 2026 •

edited

Loading