Skip to content

Create "data plan" before inversion #423

@brendan-m-murphy

Description

@brendan-m-murphy

Add a feature to take a config (e.g. .ini file) and replicate the data gathering process using only search (maybe with checks to see if data actually exists in the times you need).

The result can be inspected before running the inversion, and the plan can be used as input to the inversion, which will use the exact data specified in the plan.

In the future, perhaps we can run filtering on this plan, and maybe create data availability plots.

This feature would have at a few benefits:

  • Knowing what data you have, so you can catch problems before launching a job (e.g. need more metadata). Making the plan should only take a few minutes and little compute/memory
  • Reproducibility: we probably need to record object stores, UUIDs, and versions in the plan. We could record other metadata (version and/or commit hash of openghg and openghg_inversions)
  • Assuming there is a defined format, you could modify the plan manually, and still have a reproducible inversion. (Maybe we can add a feature here for using old data e.g. for fluxes and BC, with some way to specify how the dates are updated.)

To implement this, we should first create an internal representation of a "data plan" that can be used as input to the data gathering stage in inversion_inputs.py.

This will be a bit tricky because the fallback logic for fluxes is mixed in with the code that actually retrieves the flux. Same for footprints that are matched to obs inlets. Using slices for obs inlet height is handled internally by the "retrieve" functions in openghg.

This means a full solution will probably require hacks or changes to openghg.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions