Create an `inputs.h5` -> `inputs.gdx` pipeline to help move processing out of `b_inputs.gms` by patrickbrown4 · Pull Request #61 · ReEDS-Model/ReEDS

patrickbrown4 · 2026-04-26T19:24:41Z

Summary

This PR addresses #38 by creating a new inputs.h5 container and an h5_to_gdx.py script that feeds it into GAMS. This structure helps facilitate the exploration of non-GAMS approaches for ReEDS by letting input-processing calculations be moved out of b_inputs.gms without relying on hundreds of new .csv files for data transfer.

Technical details

Implementation notes

The interface to the new data structure is provided a pair of new functions, reeds.io.write_input_to_h5() and reeds.io.read_input()
- For example, to write an input file and load it into GAMS, the old approach of:
```
co2_cap.round(0).to_csv(os.path.join(inputs_case, 'co2_cap.csv'))
```
  in python followed in GAMS by
```
parameter co2_cap(allt)      "--metric tons-- CO2 emissions cap used when Sw_AnnualCap is on"
/
$offlisting
$ondelim
$include inputs_case%ds%co2_cap.csv
$offdelim
$onlisting
/ ;
```
  is now
```
reeds.io.write_input_to_h5(
    co2_cap, 'co2_cap', inputs_case, gamstype='parameter',
    comment='--metric tons-- CO2 emissions cap used when Sw_AnnualCap is on',
)
```
  with the GAMS declaration/load happening automatically.
- Similarly, when loading input parameters, val_r = pd.read_csv(os.path.join(case,'inputs_case','val_r.csv'), header=None) is now val_r = reeds.io.read_input(case, 'r')
A new script, h5_to_gdx.py, runs as the last input_processing script and converts inputs.h5 to inputs_0.gdx
- It also writes b_declare_(sets|parameters).gms and b_load_(sets|parameters).gms files to facilitate reading the .gdx file in b_inputs.gms (I couldn't figure out how to get it to work with $declareAndLoad inputs_case%ds%inputs_0.gdx alone; it wasn't recognizing subsets for domain checking)
New guidelines:
- New aliases should be added to inputs/sets/_aliases.csv
- New sets should follow the guidelines in inputs/sets/README.md
- Use reeds.io.write_input_to_h5() when adding new inputs instead of writing them to a .csv file and loading explicitly in b_inputs.gms
- New input-defining code should happen in python (feel free to add new scripts to input_processing as necessary) instead of b_inputs.gms
I started moving some sets/parameters to the new structure as examples but didn't want to make this PR too huge. The rest can be distributed across smaller PRs.
- I'll open a new issue tracking next steps for transitioning to this new structure and moving more processing out of b_inputs.gms/e_report.gms once this PR is merged

Additional changes

Fixed bokeh processing of health impacts
Renamed sw.csv -> wst_surface.csv to avoid confusion with switches (which are abbreviatd in python as sw)
Renamed gen_mandate_tech_list.csv -> nat_gen_tech_frac.csv to match its name in GAMS

Issues resolved

#38

Validation, testing, and comparison report(s)

Zero change for the Pacific test case:
results-v20260426_mainM0_Pacific,v20260426_inputsM1_Pacific.pptx

The only changes to inputs.gdx (aside from some minor parameter renaming) are due to rounding: diff_inputs-v20260426_mainM0_Pacific-v20260426_inputsM1_Pacific.gdx.zip

Near-zero change (rounding differences only) for the USA_defaults case: results-mainK0_USA_defaults,inputsK0_USA_defaults.pptx

20260522

Double checked, and still only rounding-error differences in the Pacific test case. I also made sure the MonteCarlo_LHS case still works.

Checklist for author

Details to double-check

Charge code provided to reviewers
Included comparison reports for appropriate test cases
Documentation updated if necessary
If input data added/modified:
- Units are specified
- New large data files handled with .h5 instead of .csv
Code formatting standardized
Reusable functions used where possible instead of copy/pasted code

General information to guide review

Zero impact on results of default case
No large data file(s) added/modified
No substantive impact on runtime for full-US reference case
No substantive impact on folder size for full-US reference case
No change to process flow (runbatch.py, d_solve_iterate.py)
No change to code organization
No change to package requirements (environment.yml or Project.toml)

Did you use LLM tools (chatbot or copilot) in the preparation of this PR? If so, describe how

No

…ogging style

…opy_files.py

… parameter

…s other region levels even though it isn't passed to GAMS)

…g i.csv

wesleyjcole

I like how tidy this makes the change.

I did see that the comments are not printed in the b_declare_{}.gms files. Would that be easy to add? I don't know if it's important, but there might be an instance where having comments there would be helpful.

wesleyjcole · 2026-05-08T21:43:34Z

@@ -884,11 +905,20 @@ def main(reeds_path, inputs_case, agglevel, regions):
        'cap_cspns': True,
        'can_imports_capacity': True,
    }
+    gamstype = {
+        'pcat': 'set',


Just flagging that pcat is going away in #12.

I thought that was the case, but it's still used in b_inputs.gms; here's a snippet from that PR branch:

ReEDS/b_inputs.gms

Lines 1179 to 1208 in 06c948a

set prescriptivelink0(pcat,ii) "initial set of prescribed categories and their technologies - used in assigning prescribed builds"

/

$offlisting

$ondelim

$include inputs_case%ds%prescriptivelink0.csv

$offdelim

$onlisting

/ ;

*include non-numeraire CSPs and then exclude numeraire CSPs in ii dimension of

*prescriptivelink0(pcat,ii) set when Sw_WaterMain is ON

prescriptivelink0("csp-ws",ii)$[(csp1(ii) or csp2(ii) or csp3(ii) or csp4(ii))$Sw_WaterMain] = yes ;

prescriptivelink0("csp-ws",ii)$[csp(ii)$i_numeraire(ii)$Sw_WaterMain] = no ;

set prescriptivelink(pcat,i) "final set of prescribed categories and their technologies - used in the model" ;

prescriptivelink(pcat,i)$prescriptivelink0(pcat,i) = yes ;

alias(pcat,ppcat) ;

* active prescriptivelink for all techs not included in the table above

* but restrict out csp techs in this calculation - since they

* are indexed by a separate pcat (csp-ws) and have special considerations

prescriptivelink(pcat,i)$[sameas(pcat,i)$(not sum{ppcat, prescriptivelink(ppcat,i) })$(not csp1(i))] = yes ;

*only geo_hydro techs are considered to meet geothermal prescriptions

prescriptivelink(pcat,i)$[geo_extra(i)] = no ;

*upgrades have no prescriptions

prescriptivelink(pcat,i)$[upgrade(i)] = no ;

This has now been removed from #12, but I'm assuming #12 will come in after this PR, so we'll just need to address it when we merge the two together.

wesleyjcole · 2026-05-08T21:52:58Z

Why the underscore for aliases and pcat? None of the others use that convention.

Just to set them apart because they are not read directly into GAMS (explained in inputs/sets/README.md)

wesleyjcole · 2026-05-08T21:54:26Z

@@ -1,5 +1,5 @@
-o "once through",


Is it feasible to keep comments in the set files? I see you moved these to the ReadMe--it just seems more useful here than there, but keeping it in the file might be a hassle.

I'm not sure if gdxpds supports element-level comments, and it would add a lot of overhead to propagate them to the inputs.h5 file. To me it doesn't seem worth it for something that's only used in two sets. I also feel like README.md is a more suitable place for documentation than raw .csv files.

wesleyjcole · 2026-05-08T21:58:35Z

+- `_aliases.csv`: aliases (extra names for the same set) used in GAMS
+  - Aliases of primary sets should be added here
+  - Aliases of sets defined in `b_inputs.gms` (e.g., `h`→`hh`) should instead be defined in GAMS after the set definition
+- `_pcat.csv`: prescribed capacity categories
+  - The `pcat` set in GAMS (defined in `writecapdat.py`) includes the members of the `i` set; this file includes only the *extra* elements on top of the `i` set


I see here that these are special case file. Is that why the underscore is used, as a flag that this won't exactly match the GAMS set?

Yeah that's right; I thought it would be confusing if all the csv files in inputs/sets match the sets used in the model except for these files.

Co-authored-by: Wesley Cole <49044852+wesleyjcole@users.noreply.github.com>

…g and clarify funciton name in h5_to_gdx.py

patrickbrown4 · 2026-05-08T23:22:54Z

I did see that the comments are not printed in the b_declare_{}.gms files. Would that be easy to add? I don't know if it's important, but there might be an instance where having comments there would be helpful.

I wasn't thinking of b_declare_{sets or parameters}.gms as files that people would actually want or need to look at; for me they're just a workaround to avoid domain violations for subsets when loading from a .gdx file. Do you think it's enough to have the comments in the inputs.gdx/inputs.h5 files instead? (In my mind, the code itself should ideally not be user facing, while these kinds of uniformly formatted data containers would be.)

patrickbrown4 · 2026-05-12T22:01:51Z

+$include b_declare_sets.gms
+$include b_declare_parameters.gms
+$gdxin inputs_case%ds%inputs_0.gdx
+$include b_load_sets.gms
+$include b_load_parameters.gms
+$gdxin


This approach could be simplified by using gams.transfer to write regular (not relaxed) sets, and then loading the whole file with $declareAndLoad. But since it would require an update to environment.yml and only affects the code in two places (these lines and h5_to_gdx.py), my preference is to do it in a followup PR and move ahead with the current approach for now, so we can start adopting the new input formalism sooner rather than later.

wesleyjcole · 2026-05-19T14:37:36Z

@@ -884,11 +905,20 @@ def main(reeds_path, inputs_case, agglevel, regions):
        'cap_cspns': True,
        'can_imports_capacity': True,
    }
+    gamstype = {
+        'pcat': 'set',


This has now been removed from #12, but I'm assuming #12 will come in after this PR, so we'll just need to address it when we merge the two together.

kodiobika

Looks great, thanks!

kodiobika · 2026-05-22T10:32:45Z

+    Returns:
+        pd.DataFrame
+    """
+    key = Path(name).stem


Just curious - are there any cases where name would be different from key? Or is this just in case someone passes a relative file path instead of just a name?

Yeah, the idea was just to standardize the input so the user can provide either name or {case}/inputs_case/{name}.csv or {name}.csv. I can't remember if there's a specific place where it's used that second way in the code though. If you think it's unnecessary/confusing to have that flexibility, we could just go with the direct name input instead.

Got it thanks, and no worries I think it's fine as-is

kodiobika · 2026-05-22T10:55:41Z

+        print(f'{Path(h5path).name}: Wrote {key} from {calling_file}')
+
+
+def write_csv_to_h5(


Could you either rename this to write_csv_to_inputs_h5 or something similar (since it only ever writes to inputs.h5), or add an argument to specify an .h5 filepath? Also "copy" rather than "write" would be a bit clearer to me as the first word of the function name, but I think either's fine

Sounds good, I went with write_csv_to_inputs_h5() because it's not exactly a copy given the dtype/name changes, but I agree it's better to specify inpugs_h5 instead of just h5

Co-authored-by: kodiobika <35176195+kodiobika@users.noreply.github.com>

… mcs_sampler.py working

patrickbrown4 added 26 commits April 15, 2026 11:18

copy .gov branch

e87cb07

Merge branch 'main' into pb/inputs

0660c5f

move scale_column() to reeds.output_calc to avoid propagating bokeh l…

edb0fc7

…ogging style

add read_input and write_csv_to_h5 to reeds.io; add h5_to_gdx.py

e6f039e

use write_csv_to_h5 for sets in copy_files instead of copying .csv files

8840ec9

standardize some set csv formats

9506d1e

input_processing: use read_input() to read quarters set

61befc0

start moving sets from b_inputs.gms to runfiles.csv

df4941e

fix comments for h5_to_gdx

23435c1

remote.download_remote_files(): print acknowledgment if local file is ok

4ade0f1

Merge branch 'main' into pb/inputs

007690e

b_inputs.gms: remove more csv set inputs; write ccseason to h5 from c…

353b1f0

…opy_files.py

write aliases in h5_to_gdx.py

1613bd6

move gen_mandate_tech_list.csv -> nat_gen_tech_frac.csv to match GAMS…

2e98528

… parameter

inputs.h5: allow for empty sets

ddd646f

move vintage definition to WriteHintage.py

1e83c9f

move spatial sets to inputs.h5

63f8551

move primary temporal sets to inputs.h5

107ef85

fix vintage bounds

f864a4b

move set comments to inputs/sets/README.md

a00721d

move sw.csv -> wst_surface.csv; fix bokeh

2da17d6

fix bokeh health calculations

2351485

clean up for PR

6e4bc88

h5_to_gdx.py: declare r set first (same as the old approach)

4554c37

rename val_county.csv -> county.csv (so it can be accessed the same a…

85b3ec5

…s other region levels even though it isn't passed to GAMS)

define pcat programmatically in writecapdat.py instead of copy-pastin…

3b2c75c

…g i.csv

patrickbrown4 requested review from kodiobika and wesleyjcole April 28, 2026 18:38

patrickbrown4 mentioned this pull request Apr 30, 2026

MGA updates & employment outputs #21

Open

23 tasks

Merge branch 'main' into pb/inputs

8651baf

write_input_to_h5(): add 'written_by' and 'units' attributes

ab6fb0c

github-actions Bot added data_changes model_changes input_processing docs postprocessing labels May 6, 2026

patrickbrown4 added 3 commits May 7, 2026 15:59

Merge branch 'main' into pb/inputs

6756c86

calc_historical_capex.py: fix missing lower-case

6d01278

Merge branch 'main' into pb/inputs

ddfba00

wesleyjcole reviewed May 8, 2026

View reviewed changes

patrickbrown4 and others added 4 commits May 8, 2026 16:40

Fix typos

932e249

Co-authored-by: Wesley Cole <49044852+wesleyjcole@users.noreply.github.com>

remove extra element from nat_gen_tech_frac.csv; add missing docstrin…

3bee175

…g and clarify funciton name in h5_to_gdx.py

Merge branch 'pb/inputs' of github.com:ReEDS-Model/ReEDS into pb/inputs

18a6f62

reeds.io.write_input_to_h5(): add missing docstring

0c3b8ca

Merge branch 'main' into pb/inputs

4d691b3

patrickbrown4 requested a review from wesleyjcole May 8, 2026 23:24

patrickbrown4 commented May 12, 2026

View reviewed changes

Merge branch 'main' into pb/inputs

6a25508

wesleyjcole approved these changes May 19, 2026

View reviewed changes

kodiobika approved these changes May 22, 2026

View reviewed changes

patrickbrown4 and others added 6 commits May 22, 2026 09:47

Merge branch 'main' into pb/inputs

7fde0e6

reeds.io: fix type annotation

1547606

Co-authored-by: kodiobika <35176195+kodiobika@users.noreply.github.com>

add hintage_unit_number=300 to scalars.csv

06cd3ef

use more explicit function names

236d9aa

write_to_inputs_h5: automatically rename unnamed series to '*'

3b338b8

add overwrite option to copy_files.get_regions_and_agglevel() to keep…

913d104

… mcs_sampler.py working

patrickbrown4 merged commit 007fa11 into main May 22, 2026
10 checks passed

patrickbrown4 deleted the pb/inputs branch May 22, 2026 18:48

patrickbrown4 mentioned this pull request May 22, 2026

Move data loaded into b_inputs.gms from .csv files to one inputs.h5 → inputs.gdx file #38

Closed

4 tasks

	set prescriptivelink0(pcat,ii) "initial set of prescribed categories and their technologies - used in assigning prescribed builds"
	/
	$offlisting
	$ondelim
	$include inputs_case%ds%prescriptivelink0.csv
	$offdelim
	$onlisting
	/ ;

	*include non-numeraire CSPs and then exclude numeraire CSPs in ii dimension of
	*prescriptivelink0(pcat,ii) set when Sw_WaterMain is ON
	prescriptivelink0("csp-ws",ii)$[(csp1(ii) or csp2(ii) or csp3(ii) or csp4(ii))$Sw_WaterMain] = yes ;
	prescriptivelink0("csp-ws",ii)$[csp(ii)$i_numeraire(ii)$Sw_WaterMain] = no ;

	set prescriptivelink(pcat,i) "final set of prescribed categories and their technologies - used in the model" ;

	prescriptivelink(pcat,i)$prescriptivelink0(pcat,i) = yes ;

	alias(pcat,ppcat) ;

	* active prescriptivelink for all techs not included in the table above
	* but restrict out csp techs in this calculation - since they
	* are indexed by a separate pcat (csp-ws) and have special considerations
	prescriptivelink(pcat,i)$[sameas(pcat,i)$(not sum{ppcat, prescriptivelink(ppcat,i) })$(not csp1(i))] = yes ;
	*only geo_hydro techs are considered to meet geothermal prescriptions
	prescriptivelink(pcat,i)$[geo_extra(i)] = no ;


	*upgrades have no prescriptions
	prescriptivelink(pcat,i)$[upgrade(i)] = no ;

		print(f'{Path(h5path).name}: Wrote {key} from {calling_file}')


		def write_csv_to_h5(

Conversation

patrickbrown4 commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Technical details

Implementation notes

Additional changes

Issues resolved

Validation, testing, and comparison report(s)

20260522

Checklist for author

Details to double-check

General information to guide review

Did you use LLM tools (chatbot or copilot) in the preparation of this PR? If so, describe how

Uh oh!

wesleyjcole left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickbrown4 May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

patrickbrown4 commented May 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kodiobika left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickbrown4 May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

patrickbrown4 commented Apr 26, 2026 •

edited

Loading

patrickbrown4 May 8, 2026 •

edited

Loading

patrickbrown4 May 22, 2026 •

edited

Loading