When analyzing any forecast model to point obs for surface temperature, we need to do a correction (e.g. ~6.5K/km) to account for height differences between the actual height of the observing platform and the forecast grid's height.
The spatiotemporal interpolations happen far into the pipeline so I'm inclined to advocate against a preprocessing approach, after thinking about it for a bit. The correction should occur after interpolation but before metric computations. I also want to try to integrate this computation in a way that allows for others to work in the same stage of processing, such as weighting (if it is the optimal place in the DAG to stick it, for instance).
|
valid_data = ( |
|
inputs.maybe_subset_variables( |
|
data, |
|
variables=input_data.variables, |
|
source_module=source_module, |
|
) |
|
.pipe( |
|
lambda ds: input_data.subset_data_to_case(ds, case_metadata, **kwargs) |
|
) |
|
.pipe(input_data.maybe_convert_to_dataset) |
|
.pipe(input_data.add_source_to_dataset_attrs) |
|
.pipe( |
|
lambda ds: derived.maybe_derive_variables( |
|
ds, |
|
variables=input_data.variables, |
|
case_metadata=case_metadata, |
|
**kwargs, |
|
) |
|
) |
|
) |
Two thoughts for it to go in the pipeline here:
After maybe_derive_variables()
I think this method is the easiest; put a drop-in postprocess() after in the pipeline. I would do some more rigorous testing to confirm differing patterns work as expected. Earthmover's recent blog post using Hypothesis is a great reference; would love to spend some time building this out. That level of rigor isn't critical though.
The only risk I see here is the DAG becoming significantly larger that would cause issues slowing down computation.
Replace maybe_derive_variables() with postprocess() handler
This method might be the more elegant long-term solution along with a rework for maybe_derive_variables() that would involve a stricter and more invariant-based workflow (e.g. requiring a derived variable to only return a DataArray).
When analyzing any forecast model to point obs for surface temperature, we need to do a correction (e.g. ~6.5K/km) to account for height differences between the actual height of the observing platform and the forecast grid's height.
The spatiotemporal interpolations happen far into the pipeline so I'm inclined to advocate against a preprocessing approach, after thinking about it for a bit. The correction should occur after interpolation but before metric computations. I also want to try to integrate this computation in a way that allows for others to work in the same stage of processing, such as weighting (if it is the optimal place in the DAG to stick it, for instance).
ExtremeWeatherBench/src/extremeweatherbench/evaluate.py
Lines 885 to 904 in 2f37abd
Two thoughts for it to go in the pipeline here:
After
maybe_derive_variables()I think this method is the easiest; put a drop-in
postprocess()after in the pipeline. I would do some more rigorous testing to confirm differing patterns work as expected. Earthmover's recent blog post using Hypothesis is a great reference; would love to spend some time building this out. That level of rigor isn't critical though.The only risk I see here is the DAG becoming significantly larger that would cause issues slowing down computation.
Replace
maybe_derive_variables()withpostprocess()handlerThis method might be the more elegant long-term solution along with a rework for
maybe_derive_variables()that would involve a stricter and more invariant-based workflow (e.g. requiring a derived variable to only return a DataArray).