Add Latin Hypercube Sampling for Monte Carlo Analysis by bsergi · Pull Request #57 · ReEDS-Model/ReEDS

bsergi · 2026-04-23T19:27:46Z

Summary

This PR adds the option to sample using a Latin Hypercube sampling (LHS) method when running a Monte Carlo analysis. It also fixes the sampling approach for load.

Technical details

LHS is a quasi-random sampling method that partitions each input distribution being sample into N bins of equal probability and then draws one sample from each bin. The advantage of this method is that it enables convergence to the "true" input distribution with a lower number of samples. The figure below illustrates this with a comparison of random and LHS methods for uniform and triangular costs using the 2050 nuclear ATB costs:

More details on the LHS approach can be found in Sheikholeslami, R., & Razavi, S. (2017).

Implementation notes

The LHS method is implemented in the mcs_sampler.py during input processing. When activated, the LHS method samples an n x d matrix where n=number of samples (specified by MCS_runs) and d=dimensions, or the number of variables being sample (specified by MCS_dist_groups).

This matrix is generated upfront and provides the place on the cumulative distribution function for each sample. These are subsequently used by a set of lhs functions to derive the realized values from the respective distributions. The original LHS sample matrix is saved to inputs_case/mcs_latin_hypercube_samples. Note that this approach is distinct from the implementation of the pure random approach, which samples weights and the applies them to the relevant files and switches.

Additional changes

Switches added/removed/changed

Adds MCS_lhs: 0 to use random sampling, 1 to use LHS (default: 1)
Modifies input_processing_only: adds an option 2 that stops input processing right after Monte Carlo sampling (useful for testing the input distributions before running).

Issues resolved

Partially addresses #41 by fixing load.

Known incompatibilities

The LHS method does not currently support regional sampling.
Sampling with VRE availability is still not functioning. The plan is to address this after the changes in Flexible start year option & tech classes addition #12 .

Relevant sources or documentation

Validation, testing, and comparison report(s)

Monte Carlo sampling is off by default so there is no change to the default case (see compare below).

results-main,update.pptx

The next slide deck summarizes input distribution and results from a set of 54-region ReEDS runs using both the random and LHS approaches for a different number of samples. In general the LHS converges faster to the expected input distributions than the random approach. Both approaches yield reasonably comparable results on aggregate metrics such as mean and 90% coverage for installed capacity, annual generation, and system costs.

20260430_latin_hypercube_sampling.pdf

Checklist for author

Details to double-check

General information to guide review

Zero impact on results of default case
No large data file(s) added/modified
No substantive impact on runtime for full-US reference case
No substantive impact on folder size for full-US reference case
No change to process flow (runbatch.py, d_solve_iterate.py)
No change to code organization
No change to package requirements (environment.yml or Project.toml)

Did you use LLM tools (chatbot or copilot) in the preparation of this PR? If so, describe how

I used Claude to generate the function docstrings

Tag points of contact here if you would like additional review of the relevant parts of the model

Co-authored-by: Copilot <copilot@github.com>

patrickbrown4

Very cool! I think we should keep the old MC approach working (currently the MonteCarlo case in cases_test.csv is failing for me on this branch) but otherwise most of my comments are just stylistic. Happy to approve once the test is passing again.

patrickbrown4 · 2026-05-11T14:49:17Z

+    """Check whether a dirichlet distribution can be mapped to a supported LHS distribution.
+
+    Dirichlet distributions with identical parameters of length 2 or 3 are
+    equivalent to uniform or triangular distributions, respectively, which


Could you say more about how the usage with 3 parameters relates to the Dirichlet distribution? The "peak" of the 3-equal-parameter Dirichlet could range between a mix of low/high and pure mid, and since the high/mid and mid/low ratios change over time, they're not exactly equivalent. Does LHS use a pure triangular distribution or is it more like Dirichlet?

Discussed; adding some spaghetti plots of the inputs could help explain the behavior

Attaching a few spaghetti plots comparing dirichlet(1,1,1) and triangular. For the former you notice some different angles, but I actually didn't observe any samples that crossed-over each other. I think that probably has to do with the low likelihood that you sample twice in a two places that are close enough to that they could cross, but I'm not certain.

Regardless, after some reflection I think it's probably best to just enforce using the triangular, uniform, or discrete with the latin hypercube and not try to translate a dirichlet over, so I've updated that accordingly.

20260514_lhs_distribution_comparison.pptx

patrickbrown4 · 2026-05-11T19:04:46Z

Could you add demand back to the MonteCarlo case in cases_test.csv so we can keep an eye on it? I think it would just be adding .load_country to its MCS_dist_groups setting.

And what do you think about adding a LHS test to cases_test.csv too? I could see it being used more often than some of the other capabilities we test for, and since the Monte Carlo capability is pretty sensitive to input file structure (which changes relatively often) it could be good to keep it in the test suite.

Good call on adding load back. I also like the idea of a separate LHS test so will add that in as well.

patrickbrown4 · 2026-05-11T20:10:33Z

+    # use sampling capability to set up random vector for modeing to generate alternatives (MGA)
+    if random_vector:
+        print("MGA random vector sampling will be implemented in the future")


I'd remove this for now since it's not implemented yet

patrickbrown4 · 2026-05-11T21:44:03Z

+    # if using random sampling, set random seed using seed + MCS run number 
+    # to allow reproducibility without having the same sample for each MCS-ReEDS call
+    else:
+        np.random.seed(seed + mcs_run_number)


Since there's still randomness in LHS, I think it might be best to still run np.random.seed(seed) for reproducibility when using LHS.

You could use the approach taken for the pras_seed switch, where if the switch is set to 0 we do not set the seed, and if it's a nonzero integer we set it to the integer; that way a user could still turn off the seed if desired. Personally I would set a nonzero default for reproducibility, but either works as long as it's user accessible.

Incidentally the explicit seed would also make it easier to extend a batch of random Monte Carlo runs; like if you do 300 and decide you want 200 more, you could just set the seed to 300 to get the "next" 200 samples

A couple of thoughts on this one, let me know your reactions:

The LHS sampling is done via scipy.stats.qmc.LatinHypercube. From what I can tell np.random isn't used by those methods and it isn't used elsewhere in the mcs_sampler.py script, so I don't think setting this would impact things.

The function I mentioned above does take a seed argument, and we do pass one to it (the default it zero). I just did a quick test of two different runs and got the same values for the LHS matrix, so I think we've got the reproducibility piece covered.

One thing to note is that the changing the number of samples or dimensions changes the matrix. That means that otherwise two identical runs with N=50 and N=100, or two runs with the same N but where one sampled UPV costs the other sampled UPV + gas prices, would have different values even for shared parameters. So, the seed only guarantees reproducibility for the same setup.

There isn't a straightforward way to extend the LHS matrix with more samples and still have it be a Latin Hypercube; the reason is that the sampling method bins the distribution and puts one sample in each bin, so adding new samples results in overlap.

Your suggestion about setting the explicit seed would work for the random approach so I can add support for that. It's already in the python script so we just need to add to ReEDS; I put in as a scalar but let me know if you think a switch is more appropriate here. I also added some explanation on the usage in the docs/user_guide section.

Ok thanks, that all makes sense. Sorry to complicate things. I think the way you have it now, with the MCS_seed being used by both the random and LHS methods, is great.

patrickbrown4

Going ahead and approving since the rest of the comments are mostly stylistic (though I do think it'd be good to add a LHS scenario to cases_test.csv and to be able to fix the LHS seed)

kodiobika

Very cool! LGTM

kodiobika · 2026-05-14T21:36:49Z

+        ## check that distribution specified is valid
+        if distribution not in MCSConstants.VALID_DISTRIBUTIONS:
+            raise ValueError(
+                f"The distribution {distribution} is not supported."
+                f"Please choose one of the following: {MCSConstants.VALID_DISTRIBUTIONS}")


It's probably better to check this upfront (before the for loop). In general I'd say it's ideal to check as many things upfront as possible to avoid going through the loop and then finding issues at the end, but I know some of the checks become more complicated outside of the for loop so no need to change any/everything

Suggested change

## check that distribution specified is valid

if distribution not in MCSConstants.VALID_DISTRIBUTIONS:

raise ValueError(

f"The distribution {distribution} is not supported."

f"Please choose one of the following: {MCSConstants.VALID_DISTRIBUTIONS}")

## check that all the distributions specified are valid

invalid_distributions = [

dist for dist in df_input_dist['dist'].unique()

if dist not in MCSConstants.VALID_DISTRIBUTIONS

]

if len(invalid_distributions) > 0:

raise ValueError(

f"The following distributions are not supported: {invalid_distributions}."

f"Please choose one of the following: {MCSConstants.VALID_DISTRIBUTIONS}")

I moved a few of the checks up out of the loop. Since the distributions I added complicate some of the rule checking I moved some of the rules in a new file (mcs_distribution_rules.yaml). There's still a loop at the end checking some things that I wasn't sure about change but those could probably be vectorized at some point too.

kodiobika · 2026-05-14T23:11:24Z

+                    #self._apply_weights_recf(dist_files, sample_idx)
+                    pass


Should this case raise an error?

Right now we flag that the sampling with siting is disabled upstream, so I'm going to remove the pass and revert the comment.

bsergi

Made some changes to address the comments. I still need to update with main but will aim to do that later this week. In the meantime let me know if you have additional comments, and thanks for reviewing!

bsergi · 2026-05-18T16:13:35Z

+        ## check that distribution specified is valid
+        if distribution not in MCSConstants.VALID_DISTRIBUTIONS:
+            raise ValueError(
+                f"The distribution {distribution} is not supported."
+                f"Please choose one of the following: {MCSConstants.VALID_DISTRIBUTIONS}")


I moved a few of the checks up out of the loop. Since the distributions I added complicate some of the rule checking I moved some of the rules in a new file (mcs_distribution_rules.yaml). There's still a loop at the end checking some things that I wasn't sure about change but those could probably be vectorized at some point too.

bsergi · 2026-05-18T16:16:53Z

+    """Check whether a dirichlet distribution can be mapped to a supported LHS distribution.
+
+    Dirichlet distributions with identical parameters of length 2 or 3 are
+    equivalent to uniform or triangular distributions, respectively, which


Attaching a few spaghetti plots comparing dirichlet(1,1,1) and triangular. For the former you notice some different angles, but I actually didn't observe any samples that crossed-over each other. I think that probably has to do with the low likelihood that you sample twice in a two places that are close enough to that they could cross, but I'm not certain.

Regardless, after some reflection I think it's probably best to just enforce using the triangular, uniform, or discrete with the latin hypercube and not try to translate a dirichlet over, so I've updated that accordingly.

20260514_lhs_distribution_comparison.pptx

bsergi · 2026-05-18T18:33:23Z

Ah ok that's interesting. I took a pass at fixing for Monte Carlo runs in this commit but I've never used this feature before so might be good for a second set of eyes there.

bsergi · 2026-05-18T19:14:24Z

+        if file_name == 'load.h5':
+            columns_other_states = [col for col in dist_files[0].keys() if col not in columns_in_hierarchy]
+            if len(columns_other_states) > 0:
+                generic_weight_matrix = self.r_weights[next(iter(self.r_weights))]


bsergi · 2026-05-18T19:38:06Z

        if single_r_weight:
            # Get the first region key
-            first_region = next(iter(self.r_weights)) 
+            first_region = next(iter(self.r_weights))


This one is easy enough to change so adjusted. I wasn't familiar with the next(iter()) syntax but I will say it is growing on me!

patrickbrown4 · 2026-05-21T17:43:32Z

        # otherwise, drop any case marked ignore
        if single:
-            if case not in single.split(','):
+            if not sum([s in case for s in single.split(',')]):


Hm, this might give weird behavior since some case names are subsets of other case names. I think my suggestion would just be to ignore the issue I raised about using --single for Monte Carlo, since it's kind of ill-defined for multi-case MC runs anyway, and is not directly related to the rest of the PR.

sounds good, will revert

This reverts commit eaad50c.

bsergi and others added 5 commits April 15, 2026 14:20

branch transfer latin hypercube sampling

5c7d66a

branch transfer latin hypercube sampling

0b67bf1

Merge branch 'bs/lhs' of github.com:ReEDS-Model/ReEDS into bs/lhs

e9ffe72

Merge remote-tracking branch 'origin/main' into bs/lhs

3a66667

Merge remote-tracking branch 'origin/main' into bs/lhs

76fe452

bsergi marked this pull request as draft April 23, 2026 19:40

bsergi self-assigned this Apr 23, 2026

bsergi and others added 10 commits May 4, 2026 12:36

Merge remote-tracking branch 'origin/main' into bs/lhs

df498a8

fix remaining instances of samples_sw

9b7301f

Co-authored-by: Copilot <copilot@github.com>

update distribution checks

ed7d9f2

add multiplier sampling for non-switch files in lhs sampling

b33a6b4

error check for random sampling method using triangular or uniform

15ef1d0

update documenation for latin hypercube

9c64dcf

add function docstrings

1ad3f82

Co-authored-by: Copilot <copilot@github.com>

framework for future MGA random vector implementation

d807129

Merge branch 'bs/lhs' of github.com:ReEDS-Model/ReEDS into bs/lhs

21ea646

Merge remote-tracking branch 'origin/main' into bs/lhs

6fdac99

bsergi marked this pull request as ready for review May 5, 2026 15:47

github-actions Bot added data_changes model_changes labels May 5, 2026

bsergi requested a review from patrickbrown4 May 5, 2026 15:48

github-actions Bot added input_processing docs switches labels May 5, 2026

bsergi requested a review from kodiobika May 5, 2026 15:48

Merge remote-tracking branch 'origin/main' into bs/lhs

bdb3182

patrickbrown4 reviewed May 11, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into bs/lhs

47fd6db

patrickbrown4 approved these changes May 12, 2026

View reviewed changes

formatting cleanup

1b50ce9

kodiobika approved these changes May 14, 2026

View reviewed changes

bsergi added 6 commits May 15, 2026 12:10

more cleanup

e8b7868

fix to single case option with Monte Carlo runs

eaad50c

remove MGA RV call from runbatch.py

dfef048

Add load back to Monte Carlo test, include test using LHS method

f8719cd

some adjustments on using next(iter())

b592f9d

updates to distribution validation

9d36eb1

bsergi commented May 18, 2026

View reviewed changes

bsergi added 5 commits May 18, 2026 17:04

formatting

ccfe257

Merge remote-tracking branch 'origin/main' into bs/lhs

118fa55

Remove MGA RV stuff for now

9c4c7dd

add seed to scalars, add comments on usage

b75dd7c

Merge remote-tracking branch 'origin/main' into bs/lhs

a7f56b5

patrickbrown4 reviewed May 21, 2026

View reviewed changes

Revert "fix to single case option with Monte Carlo runs"

f39d68b

This reverts commit eaad50c.

bsergi merged commit 719c3d2 into main May 22, 2026
10 checks passed

bsergi deleted the bs/lhs branch May 22, 2026 15:28

Conversation

bsergi commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Technical details

Implementation notes

Additional changes

Switches added/removed/changed

Issues resolved

Known incompatibilities

Relevant sources or documentation

Validation, testing, and comparison report(s)

Checklist for author

Details to double-check

General information to guide review

Did you use LLM tools (chatbot or copilot) in the preparation of this PR? If so, describe how

Tag points of contact here if you would like additional review of the relevant parts of the model

Uh oh!

patrickbrown4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickbrown4 left a comment

Choose a reason for hiding this comment

Uh oh!

kodiobika left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bsergi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bsergi commented Apr 23, 2026 •

edited

Loading