398 add run rhime run rhime multisector and a cli entry point enhancement#420
Conversation
- Adds `run_rhime(...)` for standard single-sector RHIME inversions. - Adds `run_rhime_multisector(...)` for shared-basis multi-sector RHIME inversions. - Adds `openghg-inversions run-rhime ...` and `openghg-inversions run-rhime-multisector ...` CLI entry points for config-file driven runs. - Adds lightweight RHIME result/spec dataclasses and a RHIME config template. - Adds direct modern `InversionOutput` construction for the standard RHIME path. - Adds sector-aware diagnostic output for multi-sector runs. The new RHIME runners reuse the existing data preparation and component-based PyMC model pieces, but do not route public modern behavior through `fixedbasisMCMC` or the legacy `inferpymc` output adapter.
c0f5ed6 to
5c52e09
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds modern public RHIME runner APIs and an installed CLI entry point, enabling config-driven RHIME runs without routing through the legacy fixedbasisMCMC path.
Changes:
- Introduces
run_rhime(...)andrun_rhime_multisector(...)runners returning a modernRhimeResultwith specs, canonical inputs, andInferenceData. - Adds
openghg-inversionsconsole script withrun-rhimeandrun-rhime-multisectorsubcommands. - Updates data prep / postprocessing glue for
region-dimension traces and improves obs-error metadata handling; adds tests and a RHIME config template.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
openghg_inversions/rhime.py |
New modern RHIME runners, config normalization/validation, sampling, and output writing. |
openghg_inversions/models/rhime.py |
New public RHIME model builders (single-sector + shared-basis multisector). |
openghg_inversions/cli.py |
New argparse-based CLI wired to RHIME runners. |
openghg_inversions/config/templates/rhime_template.ini |
New RHIME config template supporting modern parameter names. |
openghg_inversions/postprocessing/make_outputs.py |
Supports region-dimension traces when computing flux stats. |
openghg_inversions/postprocessing/inversion_output.py |
Deserialization updated to support region dimension as well as legacy nx. |
openghg_inversions/inversion_data/get_data.py |
Ensures obs error variables carry consistent long_name/units attrs. |
openghg_inversions/inversion_inputs.py |
Accepts integer min_error values as numeric scalars. |
pyproject.toml |
Adds openghg-inversions script entry point; includes config templates in package data. |
README.md |
Documents new Python and CLI RHIME entry points. |
openghg_inversions/__init__.py |
Adds package __init__. |
tests/test_rhime.py |
New tests covering model builders, config normalization, API/CLI smoke tests. |
tests/test_get_data.py |
Tightens regression assertions around obs error long_name behavior and formatting. |
tests/test_inversion_inputs.py |
Adds coverage for integer min_error handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
This PR introduces a modern, public RHIME execution pathway (single-sector and shared-basis multi-sector) with a first-class Python API and an installed CLI entry point, without routing through the legacy fixedbasisMCMC / inferpymc adapter path.
Changes:
- Added modern RHIME runners (
run_rhime,run_rhime_multisector) plus lightweight result/spec dataclasses and multisector diagnostics. - Added
openghg-inversionsCLI withrun-rhimeandrun-rhime-multisectorsubcommands, plus a RHIME config template distributed in the package. - Updated postprocessing / IO compatibility for both legacy
nxand modernregionbasis dimension naming, and expanded tests around RHIME + data error metadata.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_rhime.py |
New unit/smoke tests for RHIME model builders, parameter normalization/validation, API smoke runs, and CLI plumbing. |
tests/test_inversion_inputs.py |
Adds regression coverage for accepting integer min_error. |
tests/test_get_data.py |
Tightens assertions around observation error metadata (long_name) and applies formatting cleanups. |
README.md |
Documents the new RHIME Python and CLI entry points and config template location. |
pyproject.toml |
Adds console script entry point and packages the RHIME template ini file. |
openghg_inversions/rhime.py |
Core implementation of modern RHIME runners, config normalization, sampling, and output writing. |
openghg_inversions/postprocessing/make_outputs.py |
Adjusts stats chunking to handle region-based traces as well as legacy nx. |
openghg_inversions/postprocessing/inversion_output.py |
Makes DataTree deserialization robust to nx vs region basis dims. |
openghg_inversions/models/rhime.py |
Adds modern RHIME PyMC model builder(s), including multisector shared-basis variant. |
openghg_inversions/inversion_inputs.py |
Makes min_error accept integer scalars and treats numeric scalars more robustly. |
openghg_inversions/inversion_data/get_data.py |
Normalizes/propagates long_name + units for obs error components. |
openghg_inversions/config/templates/rhime_template.ini |
Adds a new RHIME config template preferring flux_sources. |
openghg_inversions/cli.py |
Implements the openghg-inversions CLI and RHIME subcommands. |
openghg_inversions/__init__.py |
Adds package initializer docstring. |
CHANGELOG.md |
Records the addition of modern RHIME runners + CLI + template. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…-a-cli-entry-point-enhancement
…-a-cli-entry-point-enhancement
Summary
Closes #398.
This PR adds modern RHIME public runners alongside the legacy
fixedbasisMCMCpath:run_rhime(...)for standard single-sector RHIME inversions.run_rhime_multisector(...)for shared-basis multi-sector RHIME inversions.openghg-inversions run-rhime ...andopenghg-inversions run-rhime-multisector ...CLI entry points for config-file driven runs.InversionOutputconstruction for the standard RHIME path.The new RHIME runners reuse the existing data preparation and component-based PyMC model pieces, but do not route public modern behavior through
fixedbasisMCMCor the legacyinferpymcoutput adapter.Notes
Multi-sector RHIME currently supports shared basis regions across sectors, but each sector keeps its own flux field and state vector. The PR therefore only creates modern sector diagnostics for multi-sector output; full PARIS/basic postprocessing for multi-sector runs should be handled in a follow-up with a sector-aware output adapter.
This PR intentionally does not add tracer support, 6 km support, or a full config system redesign.
This PR does not add full PARIS outputs for the multi-sector model, just a proof-of-concept output including posterior flux means.
Notes for reviewers
The main new code is in the top-level
rhime.py, which contains parallel inversion run paths tofixedbasisMCMC.The code in
models/rhime.pysomewhat duplicates the code inhbmcmc/inversion_pymc.py.In general, this is a fairly messy PR, but I wanted to get something I could test on real data. There are further PRs on the milestone for multi-sector models https://github.com/openghg/openghg_inversions/milestone/9 that will clean up the code.
How to test on real data
Setting up
slurm.shThe new
run_rhimefunction runs via a CLI entrypoint (instead of a script likerun_hbmcmc.py):You do not need to use variables like
CONFIG_FILEandRUN_ROOT, that is just what I did. You can add start and end dates as positional arguments like withrun_hbmcmc.py.To get this to work, you might need to update your environment. I'm using
uv, so after I checked out this branch, I diduv sync.The part of my
slurm.shthat activates this venv looks like:where for me,
REPO_DIR=/user/work/bm13805/openghg_inversions, which is where I checked out the branch and ranuv sync.Note that once you have activated your venv, the command
openghg-inversionswill be available. You can check this by callingafter you sync and activate your venv.
config file
The new
run_rhimefunction (which is called byopenghg-inversions run-rhime) accepts a subset of whatrun_hbmcmc.pyaccepts. This is to try to keep the first implementation small(ish).The default output format is a netCDF saved from the
InversionOutputobject. You can also useoutput_format = "paris"(the full set of options is: "none", "inv_out", "basic", "paris", and "inv_out" is the default).Instead of passing
reparameterise_log_normal=True, you should addreparameterise=Trueto the prior dictionary, e.g.:This is supported by
run_hbmcmc.pyandfixedbasisMCMCtoo, andreparameterise_log_normaljust adds thereparameterisekey to the prior args dictionary.You need to remove
mcmc_typefrom your ini file (although perhaps we should add this back with the new options).Also, other deprecated arguments include
calculate_min_error, which was deprecated in favour of just passing the method name (e.g. "residual", "percentile") tomin_error.If any arguments are not accepted, you'll get an error message saying which ones.
Multi-sector inversions
For multi-sector inversions, use the command
in your slurm sbatch script.
The same rules for configs apply, except that you can't specify an output format, and you can specify different priors for different sectors using a dictionary from sector names to prior args dicts.
I used the following as a test:
Note that you need to be careful to close the outer braces; I forgot a
}and got a weird error message (there is an issue to track this now).Testing
pytest tests/test_rhime.pypytest tests/test_get_data.py -k "add_averaging_error or add_obs_error"pytest tests/test_inversion_inputs.pyfocused casesruff check/ruff format --checkon touched filesRepository-wide ruff still reports pre-existing unrelated lint/format issues.
CHANGELOG.mdfilerequirements.txt