Disaggregate and re-aggregate inputs with regional scope of legacy (134) zones#102
Disaggregate and re-aggregate inputs with regional scope of legacy (134) zones#102kodiobika wants to merge 23 commits into
Conversation
…to-zone aggregation
… for non-transmission files
|
|
Thanks @stuartcohen8. Just checking again on |
|
@kodiobika yes you're correct that aggfunc should be sum for that parameter. The water volumes are normalized by capacity because when they are used in water availability constraints, they are multiplied by the amount of PSH capacity that gets built to get a total water volume. The 'per year' convention is because water availability is generally characterized as an annual quantity of water available. So the units are right, and just to confirm you should sum to aggregate and use geosize to disaggregate. In reality it's more complicated based on basin dynamics, but that's beyond the scope of what we've done here so far. |
Got it, thanks again! |
patrickbrown4
left a comment
There was a problem hiding this comment.
This is great! I really like the new structure. I'll defer to Stuart and Vincent on the hydro/water parts, but the rest looks good to me.
I think your approach to aggregate_regions.py makes sense; rather than picking through and removing the outdated code blocks now, we can move ahead with this PR and #95 in parallel, and then once they're both in, just remove the whole script.
| # Get legacy zone-to-county allocation factors for disagg_variable | ||
| disagg_data = reeds.io.get_disagg_data( | ||
| os.path.dirname(inputs_case), | ||
| disagg_variable | ||
| ) |
There was a problem hiding this comment.
In the next PR (or whatever one adds support for GSw_ZoneSet = z90), we should double check what happens if GSw_ZoneSet = z90 and GSw_Region = r/NJ.NY_NYC. NY_NYC is a subset of p127, and it's not immediately clear from copy_files.write_disagg_data_files() what would happen if your'e disaggregating/reaggregating with a subset of a z134 zone. I think it depends on whether you're using the full county2zone or the run-specific county2zone that only includes the zones in that run. I'd just want to make sure that the entire capacity (or whatever) of p127 doesn't end up in NY_NYC, since the other parts of p127 (NY_E and NY_W) aren't in the run. (Same idea as #23, but here for sub-z134 runs.)
I don't think we support sub-z134 runs now, so we can wait to test it once that capability works; just flagging since it came to mind.
Co-authored-by: Patrick Brown <25125211+patrickbrown4@users.noreply.github.com>
… into ko/agg_disagg_refactor
Summary
The two major goals of this PR are to:
reeds/input_processing/aggregate_regions.pyTechnical details
reeds/spatial.pyfor disaggregating from the legacy 134 zones to counties and aggregating from counties to the zones corresponding to a given run.reeds/input_processing/copy_files.py.Additional changes
reeds/input_processing/runfiles.csvand make it clearer which files are actually being aggregated and disaggregated, I set theaggfuncanddisaggfuncfor all files that are not read from the repo (i.e., files that are created afterreeds/input_processing/aggregate_regions.py) toignore.reeds/input_processing/runfiles.csv, I updated thedisaggfuncforunapp_water_sea_distr.csvandfromwater_req_psh_10h_1_51.csvgeosizetouniformand theaggfuncforwater_req_psh_10h_1_51.csvfrommeantosum.As part of testing this change I realized that there are duplicate rows in(@jvcarag will address this in a follow-up PR)inputs/hydro/SeaCapAdj_hy.csvso I deleted those.Issues resolved
Part of #16
Validation, testing, and comparison report(s)
Essentially zero change for the
Pacificcase:results-main_Pacific,test_Pacific.pptx
There are small changes for the
NYVT_mixedcase: results-0519_Main_NYVT_mixed,0519_AggDisagg_NYVT_mixed.pptxinputs.gdxfiles (see thegdxdiffs.ipynbnotebook below), the differences stem from thecap_hyd_szn_adj,water_req_psh, andwatsa_tempparameters.cap_hyd_szn_adjis different because doing the disaggregation and aggregation ends up removing the duplicate rows frominputs/hydro/SeaCapAdj_hy.csv(see "Additional changes"), so for the groups with duplicates, the values in the parameter go from 2 to 1.water_req_pshandwatsa_tempare different for only the county-level zones of this case because thedisaggfuncfor their corresponding files was changed fromgeosizetouniform.There are small changes for the
USA_defaultscase:results-0518_Main_USA_defaults,0518_AggDisagg_USA_defaults.pptx
inputs.gdxfiles (see thegdxdiffs.ipynbnotebook below), the differences almost all stem from the aggregated zones (z28andz122). Because we now disaggregate to counties before re-aggregating to these zones, inputs where theaggfuncismeannow represent a weighted average of legacy zonal values (where weights correspond to the number of counties in each legacy zone) rather than a simple average of legacy zonal values, which results in different values for these aggregated zones. The only difference not related to the aggregated zones is in thehyd_add_upg_capparameter, wherep108andp69no longer have values. This is because we now disaggregate the corresponding file to counties first according to thehydroexistdisaggfunc, and these legacy zones have no existing hydro.There are small changes for the
USA_decarbcase:results-0519_Main_USA_decarb,0519_AggDisagg_USA_decarb.pptx
This is a notebook looking at the differences in
inputs.gdxfor theUSA_defaultsandNYVT_mixedcases:gdxdiffs.ipynb. Aside from the differences explained above, there are only rounding-error differences.
Checklist for author
Details to double-check
General information to guide review
Did you use LLM tools (chatbot or copilot) in the preparation of this PR? If so, describe how
No
Tag points of contact here if you would like additional review of the relevant parts of the model