Disaggregate and re-aggregate inputs with regional scope of legacy (134) zones by kodiobika · Pull Request #102 · ReEDS-Model/ReEDS

kodiobika · 2026-05-20T21:28:32Z

Summary

The two major goals of this PR are to:

Allow the input files that we only have legacy zonal data for (i.e., data corresponding to the current default 134 zones) to be incorporated in ReEDS runs using new/custom zones by first disaggregating those inputs to the county level and then re-aggregating them to the zone level (using the zones for any given run).
Mostly deprecate reeds/input_processing/aggregate_regions.py

Technical details

Adds functions to reeds/spatial.py for disaggregating from the legacy 134 zones to counties and aggregating from counties to the zones corresponding to a given run.
For all files except the transmission files (which will be addressed in a follow-up PR), aggregation and disaggregation now take place in reeds/input_processing/copy_files.py.
Disaggregation and aggregation both take place in the same run now (rather than disaggregation only happening in sub-BA runs and aggregation only happening in super-BA runs)

Additional changes

To clean up reeds/input_processing/runfiles.csv and make it clearer which files are actually being aggregated and disaggregated, I set the aggfunc and disaggfunc for all files that are not read from the repo (i.e., files that are created after reeds/input_processing/aggregate_regions.py) to ignore.
In reeds/input_processing/runfiles.csv, I updated the disaggfunc for unapp_water_sea_distr.csv ~~and water_req_psh_10h_1_51.csv~~ from geosize to uniform and the aggfunc for water_req_psh_10h_1_51.csv from mean to sum.
~~As part of testing this change I realized that there are duplicate rows in inputs/hydro/SeaCapAdj_hy.csv so I deleted those.~~ (@jvcarag will address this in a follow-up PR)

Issues resolved

Part of #16

Validation, testing, and comparison report(s)

Essentially zero change for the Pacific case:
results-main_Pacific,test_Pacific.pptx
There are small changes for the NYVT_mixed case: results-0519_Main_NYVT_mixed,0519_AggDisagg_NYVT_mixed.pptx
- Based on the inputs.gdx files (see the gdxdiffs.ipynb notebook below), the differences stem from the cap_hyd_szn_adj, water_req_psh, and watsa_temp parameters. cap_hyd_szn_adj is different because doing the disaggregation and aggregation ends up removing the duplicate rows from inputs/hydro/SeaCapAdj_hy.csv (see "Additional changes"), so for the groups with duplicates, the values in the parameter go from 2 to 1. water_req_psh and watsa_temp are different for only the county-level zones of this case because the disaggfunc for their corresponding files was changed from geosize to uniform.
There are small changes for the USA_defaults case:
results-0518_Main_USA_defaults,0518_AggDisagg_USA_defaults.pptx
- Based on the inputs.gdx files (see the gdxdiffs.ipynb notebook below), the differences almost all stem from the aggregated zones (z28 and z122). Because we now disaggregate to counties before re-aggregating to these zones, inputs where the aggfunc is mean now represent a weighted average of legacy zonal values (where weights correspond to the number of counties in each legacy zone) rather than a simple average of legacy zonal values, which results in different values for these aggregated zones. The only difference not related to the aggregated zones is in the hyd_add_upg_cap parameter, where p108 and p69 no longer have values. This is because we now disaggregate the corresponding file to counties first according to the hydroexist disaggfunc, and these legacy zones have no existing hydro.
There are small changes for the USA_decarb case:
results-0519_Main_USA_decarb,0519_AggDisagg_USA_decarb.pptx
This is a notebook looking at the differences in inputs.gdx for the USA_defaults and NYVT_mixed cases:
gdxdiffs.ipynb. Aside from the differences explained above, there are only rounding-error differences.

Checklist for author

Details to double-check

Charge code provided to reviewers
Included comparison reports for appropriate test cases
Code formatting standardized
Reusable functions used where possible instead of copy/pasted code

General information to guide review

Zero impact on results of default case
No large data file(s) added/modified
No substantive impact on runtime for full-US reference case
No substantive impact on folder size for full-US reference case
No change to process flow (runreeds.py, reeds/core/solve/solve.py)
No change to code organization
No change to package requirements (environment.yml or Project.toml)

Did you use LLM tools (chatbot or copilot) in the preparation of this PR? If so, describe how

No

Tag points of contact here if you would like additional review of the relevant parts of the model

Water cooling: @jcarag or @stuartcohen8

Hydropower or PSH: @stuartcohen8

…to-zone aggregation

… for non-transmission files

stuartcohen8 · 2026-05-21T22:01:11Z

unapp_water_sea_distr.csv is ok to be uniform, but water_req_psh_10h_1_51.csv should be reverted to geosize. The latter are water volumes, so they should be split up when disaggregating. We think there is a separate bug with SeaCapAdj_hy.csv because older files do not have duplicates and have values for the hydD technology. Somewhere in input processing for PR1692 in the internal repo, this occurred. @jvcarag will make an issue to correct this file so that it has 1 values for hydD and no duplicates. In practice, it will not affect most if not all solutions because hydD capacity is high cost and therefore rarely economic. But if there are prescribed hydD builds, they might have there capacity zeroed out.

kodiobika · 2026-05-21T22:18:52Z

unapp_water_sea_distr.csv is ok to be uniform, but water_req_psh_10h_1_51.csv should be reverted to geosize. The latter are water volumes, so they should be split up when disaggregating. We think there is a separate bug with SeaCapAdj_hy.csv because older files do not have duplicates and have values for the hydD technology. Somewhere in input processing for PR1692 in the internal repo, this occurred. @jvcarag will make an issue to correct this file so that it has 1 values for hydD and no duplicates. In practice, it will not affect most if not all solutions because hydD capacity is high cost and therefore rarely economic. But if there are prescribed hydD builds, they might have there capacity zeroed out.

Thanks @stuartcohen8. Just checking again on water_req_psh_10h_1_51.csv, I see in b_inputs.gms that the units for water_req_psh are Mgal/MW/yr (as opposed to Mgal). Is that not correct? And if it's actually Mgal, should the aggfunc for that file be updated from mean to sum? Everything else sounds good to me.

stuartcohen8 · 2026-05-21T22:25:49Z

@kodiobika yes you're correct that aggfunc should be sum for that parameter. The water volumes are normalized by capacity because when they are used in water availability constraints, they are multiplied by the amount of PSH capacity that gets built to get a total water volume. The 'per year' convention is because water availability is generally characterized as an annual quantity of water available. So the units are right, and just to confirm you should sum to aggregate and use geosize to disaggregate. In reality it's more complicated based on basin dynamics, but that's beyond the scope of what we've done here so far.

kodiobika · 2026-05-21T22:31:34Z

@kodiobika yes you're correct that aggfunc should be sum for that parameter. The water volumes are normalized by capacity because when they are used in water availability constraints, they are multiplied by the amount of PSH capacity that gets built to get a total water volume. The 'per year' convention is because water availability is generally characterized as an annual quantity of water available. So the units are right, and just to confirm you should sum to aggregate and use geosize to disaggregate. In reality it's more complicated based on basin dynamics, but that's beyond the scope of what we've done here so far.

Got it, thanks again!

…to 'sum'

patrickbrown4

This is great! I really like the new structure. I'll defer to Stuart and Vincent on the hydro/water parts, but the rest looks good to me.

I think your approach to aggregate_regions.py makes sense; rather than picking through and removing the outdated code blocks now, we can move ahead with this PR and #95 in parallel, and then once they're both in, just remove the whole script.

patrickbrown4 · 2026-05-22T20:37:24Z

+    # Get legacy zone-to-county allocation factors for disagg_variable
+    disagg_data = reeds.io.get_disagg_data(
+        os.path.dirname(inputs_case),
+        disagg_variable
+    )


In the next PR (or whatever one adds support for GSw_ZoneSet = z90), we should double check what happens if GSw_ZoneSet = z90 and GSw_Region = r/NJ.NY_NYC. NY_NYC is a subset of p127, and it's not immediately clear from copy_files.write_disagg_data_files() what would happen if your'e disaggregating/reaggregating with a subset of a z134 zone. I think it depends on whether you're using the full county2zone or the run-specific county2zone that only includes the zones in that run. I'd just want to make sure that the entire capacity (or whatever) of p127 doesn't end up in NY_NYC, since the other parts of p127 (NY_E and NY_W) aren't in the run. (Same idea as #23, but here for sub-z134 runs.)

I don't think we support sub-z134 runs now, so we can wait to test it once that capability works; just flagging since it came to mind.

…ons.py

Co-authored-by: Patrick Brown <25125211+patrickbrown4@users.noreply.github.com>

… into ko/agg_disagg_refactor

kodiobika added 8 commits May 18, 2026 12:25

Create functions for legacy zone-to-county disaggregation and county-…

fa7eb03

…to-zone aggregation

Move agg/disagg processing from aggregate_regions.py to copy_files.py…

9666ee6

… for non-transmission files

Ignore aggfunc/disaggfunc for postcopy files and fix bugs

928bc91

Merge branch 'main' into ko/agg_disagg_refactor

d01de77

Add function docstrings + cleanup

d10138c

Merge branch 'main' into ko/agg_disagg_refactor

a30882b

Merge branch 'main' into ko/agg_disagg_refactor

15f4bd1

Remove duplicate rows from SeaCapAdj_hy.csv

d49ce8d

github-actions Bot added data_changes model_changes labels May 20, 2026

kodiobika requested review from jvcarag, patrickbrown4 and stuartcohen8 May 20, 2026 21:58

kodiobika changed the title ~~Rewrite aggregation and disaggregation functionalities in reeds/spatial.py~~ Move most aggregation and disaggregation to input_processing/copy_files.py May 20, 2026

kodiobika self-assigned this May 20, 2026

Cleanup

1fe6627

kodiobika added input_processing and removed model_changes labels May 20, 2026

kodiobika changed the title ~~Move most aggregation and disaggregation to input_processing/copy_files.py~~ Disaggregate and re-aggregate inputs with legacy zonal data May 20, 2026

kodiobika changed the title ~~Disaggregate and re-aggregate inputs with legacy zonal data~~ Disaggregate and re-aggregate inputs with regional scope of legacy (134) zones May 20, 2026

patrickbrown4 mentioned this pull request May 20, 2026

Update conda environment #103

Draft

12 tasks

kodiobika added 3 commits May 22, 2026 09:03

Delete unused function

f1fd5e9

Revert changes to SeaCapAdj_hy.csv

d1c8ea0

For water_req_psh, revert disaggfunc to 'geosize' and change aggfunc …

9720eec

…to 'sum'

github-actions Bot added the model_changes label May 22, 2026

Merge branch 'main' into ko/agg_disagg_refactor

e18f59a

patrickbrown4 approved these changes May 22, 2026

View reviewed changes

kodiobika and others added 10 commits May 22, 2026 19:18

Merge branch 'main' into ko/agg_disagg_refactor

8ad9a42

Merge branch 'main' into ko/agg_disagg_refactor

6f9e5e8

Delete rows from runfiles.csv with null filepaths

4e7ab69

Delete inputfiles variable and missing file check from aggregate_regi…

abc5a7e

…ons.py

Update reeds/spatial.py

70ee77c

Co-authored-by: Patrick Brown <25125211+patrickbrown4@users.noreply.github.com>

Merge branch 'ko/agg_disagg_refactor' of github.com:ReEDS-Model/ReEDS…

0931f99

… into ko/agg_disagg_refactor

Add Literal type indicator for disagg_variable

988dfdb

Treat 'state_lpf' as valid disagg variable

2f45e9c

Bugfix

3b75dc5

Set aggfunc for prescribed builds to 'ignore'

21dcbcd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disaggregate and re-aggregate inputs with regional scope of legacy (134) zones#102

Disaggregate and re-aggregate inputs with regional scope of legacy (134) zones#102
kodiobika wants to merge 23 commits into
mainfrom
ko/agg_disagg_refactor

kodiobika commented May 20, 2026 •

edited

Loading

Uh oh!

stuartcohen8 commented May 21, 2026

Uh oh!

kodiobika commented May 21, 2026

Uh oh!

stuartcohen8 commented May 21, 2026

Uh oh!

kodiobika commented May 21, 2026

Uh oh!

patrickbrown4 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

patrickbrown4 May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kodiobika commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Technical details

Additional changes

Issues resolved

Validation, testing, and comparison report(s)

Checklist for author

Details to double-check

General information to guide review

Did you use LLM tools (chatbot or copilot) in the preparation of this PR? If so, describe how

Tag points of contact here if you would like additional review of the relevant parts of the model

Uh oh!

stuartcohen8 commented May 21, 2026

Uh oh!

kodiobika commented May 21, 2026

Uh oh!

stuartcohen8 commented May 21, 2026

Uh oh!

kodiobika commented May 21, 2026

Uh oh!

patrickbrown4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

patrickbrown4 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kodiobika commented May 20, 2026 •

edited

Loading