Skip to content

Create an inputs.h5 -> inputs.gdx pipeline to help move processing out of b_inputs.gms#61

Merged
patrickbrown4 merged 43 commits into
mainfrom
pb/inputs
May 22, 2026
Merged

Create an inputs.h5 -> inputs.gdx pipeline to help move processing out of b_inputs.gms#61
patrickbrown4 merged 43 commits into
mainfrom
pb/inputs

Conversation

@patrickbrown4
Copy link
Copy Markdown
Contributor

@patrickbrown4 patrickbrown4 commented Apr 26, 2026

Summary

This PR addresses #38 by creating a new inputs.h5 container and an h5_to_gdx.py script that feeds it into GAMS. This structure helps facilitate the exploration of non-GAMS approaches for ReEDS by letting input-processing calculations be moved out of b_inputs.gms without relying on hundreds of new .csv files for data transfer.

Technical details

Implementation notes

  • The interface to the new data structure is provided a pair of new functions, reeds.io.write_input_to_h5() and reeds.io.read_input()
    • For example, to write an input file and load it into GAMS, the old approach of:
      co2_cap.round(0).to_csv(os.path.join(inputs_case, 'co2_cap.csv'))
      in python followed in GAMS by
      parameter co2_cap(allt)      "--metric tons-- CO2 emissions cap used when Sw_AnnualCap is on"
      /
      $offlisting
      $ondelim
      $include inputs_case%ds%co2_cap.csv
      $offdelim
      $onlisting
      / ;
      
      is now
      reeds.io.write_input_to_h5(
          co2_cap, 'co2_cap', inputs_case, gamstype='parameter',
          comment='--metric tons-- CO2 emissions cap used when Sw_AnnualCap is on',
      )
      with the GAMS declaration/load happening automatically.
    • Similarly, when loading input parameters, val_r = pd.read_csv(os.path.join(case,'inputs_case','val_r.csv'), header=None) is now val_r = reeds.io.read_input(case, 'r')
  • A new script, h5_to_gdx.py, runs as the last input_processing script and converts inputs.h5 to inputs_0.gdx
    • It also writes b_declare_(sets|parameters).gms and b_load_(sets|parameters).gms files to facilitate reading the .gdx file in b_inputs.gms (I couldn't figure out how to get it to work with $declareAndLoad inputs_case%ds%inputs_0.gdx alone; it wasn't recognizing subsets for domain checking)
  • New guidelines:
    • New aliases should be added to inputs/sets/_aliases.csv
    • New sets should follow the guidelines in inputs/sets/README.md
    • Use reeds.io.write_input_to_h5() when adding new inputs instead of writing them to a .csv file and loading explicitly in b_inputs.gms
    • New input-defining code should happen in python (feel free to add new scripts to input_processing as necessary) instead of b_inputs.gms
  • I started moving some sets/parameters to the new structure as examples but didn't want to make this PR too huge. The rest can be distributed across smaller PRs.
    • I'll open a new issue tracking next steps for transitioning to this new structure and moving more processing out of b_inputs.gms/e_report.gms once this PR is merged

Additional changes

  • Fixed bokeh processing of health impacts
  • Renamed sw.csv -> wst_surface.csv to avoid confusion with switches (which are abbreviatd in python as sw)
  • Renamed gen_mandate_tech_list.csv -> nat_gen_tech_frac.csv to match its name in GAMS

Issues resolved

#38

Validation, testing, and comparison report(s)

Zero change for the Pacific test case:
results-v20260426_mainM0_Pacific,v20260426_inputsM1_Pacific.pptx

The only changes to inputs.gdx (aside from some minor parameter renaming) are due to rounding: diff_inputs-v20260426_mainM0_Pacific-v20260426_inputsM1_Pacific.gdx.zip

Near-zero change (rounding differences only) for the USA_defaults case: results-mainK0_USA_defaults,inputsK0_USA_defaults.pptx

20260522

Double checked, and still only rounding-error differences in the Pacific test case. I also made sure the MonteCarlo_LHS case still works.

Checklist for author

Details to double-check

  • Charge code provided to reviewers
  • Included comparison reports for appropriate test cases
  • Documentation updated if necessary
  • If input data added/modified:
    • Units are specified
    • New large data files handled with .h5 instead of .csv
  • Code formatting standardized
  • Reusable functions used where possible instead of copy/pasted code

General information to guide review

  • Zero impact on results of default case
  • No large data file(s) added/modified
  • No substantive impact on runtime for full-US reference case
  • No substantive impact on folder size for full-US reference case
  • No change to process flow (runbatch.py, d_solve_iterate.py)
  • No change to code organization
  • No change to package requirements (environment.yml or Project.toml)

Did you use LLM tools (chatbot or copilot) in the preparation of this PR? If so, describe how

No

…s other region levels even though it isn't passed to GAMS)
Copy link
Copy Markdown
Contributor

@wesleyjcole wesleyjcole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how tidy this makes the change.

I did see that the comments are not printed in the b_declare_{}.gms files. Would that be easy to add? I don't know if it's important, but there might be an instance where having comments there would be helpful.

Comment thread input_processing/writecapdat.py Outdated
Comment thread inputs/national_generation/nat_gen_tech_frac.csv Outdated
Comment thread input_processing/h5_to_gdx.py Outdated
Comment thread input_processing/h5_to_gdx.py Outdated
Comment thread input_processing/writecapdat.py Outdated
@@ -884,11 +905,20 @@ def main(reeds_path, inputs_case, agglevel, regions):
'cap_cspns': True,
'can_imports_capacity': True,
}
gamstype = {
'pcat': 'set',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just flagging that pcat is going away in #12.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that was the case, but it's still used in b_inputs.gms; here's a snippet from that PR branch:

ReEDS/b_inputs.gms

Lines 1179 to 1208 in 06c948a

set prescriptivelink0(pcat,ii) "initial set of prescribed categories and their technologies - used in assigning prescribed builds"
/
$offlisting
$ondelim
$include inputs_case%ds%prescriptivelink0.csv
$offdelim
$onlisting
/ ;
*include non-numeraire CSPs and then exclude numeraire CSPs in ii dimension of
*prescriptivelink0(pcat,ii) set when Sw_WaterMain is ON
prescriptivelink0("csp-ws",ii)$[(csp1(ii) or csp2(ii) or csp3(ii) or csp4(ii))$Sw_WaterMain] = yes ;
prescriptivelink0("csp-ws",ii)$[csp(ii)$i_numeraire(ii)$Sw_WaterMain] = no ;
set prescriptivelink(pcat,i) "final set of prescribed categories and their technologies - used in the model" ;
prescriptivelink(pcat,i)$prescriptivelink0(pcat,i) = yes ;
alias(pcat,ppcat) ;
* active prescriptivelink for all techs not included in the table above
* but restrict out csp techs in this calculation - since they
* are indexed by a separate pcat (csp-ws) and have special considerations
prescriptivelink(pcat,i)$[sameas(pcat,i)$(not sum{ppcat, prescriptivelink(ppcat,i) })$(not csp1(i))] = yes ;
*only geo_hydro techs are considered to meet geothermal prescriptions
prescriptivelink(pcat,i)$[geo_extra(i)] = no ;
*upgrades have no prescriptions
prescriptivelink(pcat,i)$[upgrade(i)] = no ;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has now been removed from #12, but I'm assuming #12 will come in after this PR, so we'll just need to address it when we merge the two together.

Comment thread inputs/sets/_aliases.csv
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the underscore for aliases and pcat? None of the others use that convention.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to set them apart because they are not read directly into GAMS (explained in inputs/sets/README.md)

Comment thread inputs/sets/ctt.csv
@@ -1,5 +1,5 @@
o "once through",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it feasible to keep comments in the set files? I see you moved these to the ReadMe--it just seems more useful here than there, but keeping it in the file might be a hassle.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if gdxpds supports element-level comments, and it would add a lot of overhead to propagate them to the inputs.h5 file. To me it doesn't seem worth it for something that's only used in two sets. I also feel like README.md is a more suitable place for documentation than raw .csv files.

Comment thread inputs/sets/README.md
Comment on lines +44 to +48
- `_aliases.csv`: aliases (extra names for the same set) used in GAMS
- Aliases of primary sets should be added here
- Aliases of sets defined in `b_inputs.gms` (e.g., `h`→`hh`) should instead be defined in GAMS after the set definition
- `_pcat.csv`: prescribed capacity categories
- The `pcat` set in GAMS (defined in `writecapdat.py`) includes the members of the `i` set; this file includes only the *extra* elements on top of the `i` set
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see here that these are special case file. Is that why the underscore is used, as a flag that this won't exactly match the GAMS set?

Copy link
Copy Markdown
Contributor Author

@patrickbrown4 patrickbrown4 May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's right; I thought it would be confusing if all the csv files in inputs/sets match the sets used in the model except for these files.

Comment thread reeds/io.py
@patrickbrown4
Copy link
Copy Markdown
Contributor Author

I did see that the comments are not printed in the b_declare_{}.gms files. Would that be easy to add? I don't know if it's important, but there might be an instance where having comments there would be helpful.

I wasn't thinking of b_declare_{sets or parameters}.gms as files that people would actually want or need to look at; for me they're just a workaround to avoid domain violations for subsets when loading from a .gdx file. Do you think it's enough to have the comments in the inputs.gdx/inputs.h5 files instead? (In my mind, the code itself should ideally not be user facing, while these kinds of uniformly formatted data containers would be.)

@patrickbrown4 patrickbrown4 requested a review from wesleyjcole May 8, 2026 23:24
Comment thread b_inputs.gms Outdated
Comment on lines +86 to +91
$include b_declare_sets.gms
$include b_declare_parameters.gms
$gdxin inputs_case%ds%inputs_0.gdx
$include b_load_sets.gms
$include b_load_parameters.gms
$gdxin
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach could be simplified by using gams.transfer to write regular (not relaxed) sets, and then loading the whole file with $declareAndLoad. But since it would require an update to environment.yml and only affects the code in two places (these lines and h5_to_gdx.py), my preference is to do it in a followup PR and move ahead with the current approach for now, so we can start adopting the new input formalism sooner rather than later.

@@ -884,11 +905,20 @@ def main(reeds_path, inputs_case, agglevel, regions):
'cap_cspns': True,
'can_imports_capacity': True,
}
gamstype = {
'pcat': 'set',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has now been removed from #12, but I'm assuming #12 will come in after this PR, so we'll just need to address it when we merge the two together.

Copy link
Copy Markdown
Contributor

@kodiobika kodiobika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

Comment thread reeds/io.py
Returns:
pd.DataFrame
"""
key = Path(name).stem
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - are there any cases where name would be different from key? Or is this just in case someone passes a relative file path instead of just a name?

Copy link
Copy Markdown
Contributor Author

@patrickbrown4 patrickbrown4 May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the idea was just to standardize the input so the user can provide either name or {case}/inputs_case/{name}.csv or {name}.csv. I can't remember if there's a specific place where it's used that second way in the code though. If you think it's unnecessary/confusing to have that flexibility, we could just go with the direct name input instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it thanks, and no worries I think it's fine as-is

Comment thread reeds/io.py Outdated
print(f'{Path(h5path).name}: Wrote {key} from {calling_file}')


def write_csv_to_h5(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you either rename this to write_csv_to_inputs_h5 or something similar (since it only ever writes to inputs.h5), or add an argument to specify an .h5 filepath? Also "copy" rather than "write" would be a bit clearer to me as the first word of the function name, but I think either's fine

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I went with write_csv_to_inputs_h5() because it's not exactly a copy given the dtype/name changes, but I agree it's better to specify inpugs_h5 instead of just h5

Comment thread reeds/io.py Outdated
Comment thread reeds/input_processing/WriteHintage.py Outdated
Comment thread reeds/input_processing/recf.py Outdated
Comment thread reeds/io.py Outdated
@patrickbrown4 patrickbrown4 merged commit 007fa11 into main May 22, 2026
10 checks passed
@patrickbrown4 patrickbrown4 deleted the pb/inputs branch May 22, 2026 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants