Skip to content

Turn off the ironing out loop by default#1170

Merged
tsmbland merged 19 commits intomainfrom
ironing_out_off
Mar 16, 2026
Merged

Turn off the ironing out loop by default#1170
tsmbland merged 19 commits intomainfrom
ironing_out_off

Conversation

@tsmbland
Copy link
Copy Markdown
Collaborator

@tsmbland tsmbland commented Mar 4, 2026

Description

Turns off the ironing out loop by default (max_ironing_out_iterations = 1), as requested by @ahawkes, but maintains a regression test for a patched version of the simple model with the loop turned on (iterations = 10, the previous default). To do this I had to modify the patching API to allow patching the example model's model.toml (this will also be useful for #1105)

Turns out for the "simple" example this doesn't have any impact, but for others it does

Fixes #1165

Type of change

  • Bug fix (non-breaking change to fix an issue)
  • New feature (non-breaking change to add functionality)
  • Refactoring (non-breaking, non-functional change to improve maintainability)
  • Optimization (non-breaking change to speed up the code)
  • Breaking change (whatever its nature)
  • Documentation (improve or add documentation)

Key checklist

  • All tests pass: $ cargo test
  • The documentation builds and looks OK: $ cargo doc
  • Update release notes for the latest release if this PR adds a new feature or fixes a bug
    present in the previous release

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 4, 2026

Codecov Report

❌ Patch coverage is 95.10490% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.02%. Comparing base (0eedcfe) to head (a0c9e2d).
⚠️ Report is 21 commits behind head on main.

Files with missing lines Patch % Lines
src/simulation/prices.rs 62.50% 3 Missing ⚠️
src/patch.rs 97.22% 2 Missing ⚠️
src/cli/example.rs 85.71% 0 Missing and 1 partial ⚠️
src/example/patches.rs 98.18% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1170      +/-   ##
==========================================
+ Coverage   87.32%   89.02%   +1.69%     
==========================================
  Files          57       57              
  Lines        7726     7838     +112     
  Branches     7726     7838     +112     
==========================================
+ Hits         6747     6978     +231     
+ Misses        673      564     -109     
+ Partials      306      296      -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tsmbland tsmbland marked this pull request as ready for review March 4, 2026 14:13
Copilot AI review requested due to automatic review settings March 4, 2026 14:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes the default behavior of the model “ironing out” loop by reducing max_ironing_out_iterations from 10 to 1, and updates expected regression outputs accordingly. It also extends the patched-example mechanism so a patched example can optionally include a model.toml patch (intended to preserve coverage of the loop-enabled configuration via a patched “simple” model).

Changes:

  • Set the default max_ironing_out_iterations to 1 (and update the input schema default accordingly).
  • Extend the patched-example registry to include an optional TOML patch, adding a simple_ironing_out patched example that sets iterations back to 10.
  • Refresh a large set of regression “golden” CSV outputs to match the new default behavior; add new golden outputs for simple_ironing_out.

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/data/two_regions/commodity_prices.csv Updated expected commodity price outputs under the new default iteration count.
tests/data/two_regions/assets.csv Updated expected asset outputs under the new default iteration count.
tests/data/two_outputs/commodity_prices.csv Updated expected commodity price outputs under the new default iteration count.
tests/data/two_outputs/assets.csv Updated expected asset outputs under the new default iteration count.
tests/data/simple_npv/assets.csv Updated expected asset outputs for the NPV patched scenario under the new default.
tests/data/simple_ironing_out/commodity_prices.csv New golden output for the patched “simple” model with ironing-out enabled (iterations=10).
tests/data/simple_ironing_out/commodity_flows.csv New golden output for the patched “simple” model with ironing-out enabled (iterations=10).
tests/data/simple_ironing_out/assets.csv New golden output for the patched “simple” model with ironing-out enabled (iterations=10).
tests/data/simple/debug_solver.csv Updated debug solver expectations reflecting fewer iterations by default.
tests/data/simple/debug_dispatch_assets.csv Updated debug dispatch expectations reflecting fewer iterations by default.
tests/data/simple/debug_commodity_balance_duals.csv Updated debug dual expectations reflecting fewer iterations by default.
tests/data/simple/debug_appraisal_results.csv Updated appraisal debug expectations reflecting fewer iterations by default.
tests/data/muse1_default/commodity_prices.csv Updated expected commodity price outputs under the new default iteration count.
tests/data/muse1_default/commodity_flows.csv Updated expected commodity flow outputs under the new default iteration count.
tests/data/muse1_default/assets.csv Updated expected asset outputs under the new default iteration count.
tests/data/circularity/commodity_prices.csv Updated expected commodity price outputs under the new default iteration count.
tests/data/circularity/assets.csv Updated expected asset outputs under the new default iteration count.
src/model/parameters.rs Changes the default max_ironing_out_iterations from 10 to 1.
src/example/patches.rs Extends patch registry to include optional TOML patch; adds simple_ironing_out patch entry.
src/cli/example.rs Updates CLI patched-example extraction to apply file patches + optional TOML patch.
schemas/input/model.yaml Updates schema default for max_ironing_out_iterations to 1.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/cli/example.rs
fn extract_example(name: &str, patch: bool, dest: &Path) -> Result<()> {
if patch {
let patches = get_patches(name)?;
let (file_patches, toml_patch) = get_patches(name)?;
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_patches(name)? returns a reference to a (Vec<FilePatch>, Option<String>), but this code destructures it as if it were an owned tuple. This won’t compile (type mismatch &(…) vs (…)). Either bind the returned reference and access its fields, or change get_patches to return an owned tuple (e.g., cloned) so destructuring works.

Suggested change
let (file_patches, toml_patch) = get_patches(name)?;
let (file_patches, toml_patch) = get_patches(name)?.clone();

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it does compile...

Comment thread src/example/patches.rs
@tsmbland
Copy link
Copy Markdown
Collaborator Author

tsmbland commented Mar 4, 2026

@alexdewar Now getting slightly different results on macos which is concerning. I seem to remember something like this has happened before...?

@tsmbland tsmbland requested review from Aurashk and alexdewar March 4, 2026 14:25
@alexdewar
Copy link
Copy Markdown
Member

@alexdewar Now getting slightly different results on macos which is concerning. I seem to remember something like this has happened before...?

These differences are pretty large too:

thread 'two_regions' (11401) panicked at tests/regression.rs:78:5:
The following errors occurred:
  * commodity_flows.csv: line 751:
    + "2045,8,gas,all-year.night,-0.86000992"
    - "2045,8,gas,all-year.night,-1.39307392"
  * commodity_flows.csv: line 752:
    + "2045,8,electricity,all-year.night,0.514976"
    - "2045,8,electricity,all-year.night,0.834176"
  * commodity_flows.csv: line 753:
    + "2045,8,CO2f,all-year.night,47.20784992"
    - "2045,8,CO2f,all-year.night,76.46891392"
  * commodity_flows.csv: line 955:
    + "2045,25,gas,all-year.night,-0.5330640000000001"
    - "2045,25,gas,all-year.night,-0.0"
  * commodity_flows.csv: line 956:
    + "2045,25,electricity,all-year.night,0.3192000000000001"
    - "2045,25,electricity,all-year.night,0.0"
  * commodity_flows.csv: line 957:
    + "2045,25,CO2f,all-year.night,29.261064000000008"
    - "2045,25,CO2f,all-year.night,0.0"

Not sure what's going on here. If you want to compare the macOS results with e.g. the Windows ones, we upload them as test artifacts you can see on the summary page: https://github.com/EnergySystemsModellingLab/MUSE2/actions/runs/22673536177?pr=1170

@tsmbland
Copy link
Copy Markdown
Collaborator Author

tsmbland commented Mar 5, 2026

The difference concerns electricity supply in the night timeslice in 2045 by two assets (8 and 25):

  • on mac: 0.32 electricity supply by asset 25, 0.51 by asset 8 (= 0.83 total)
  • on windows/ubuntu: 0 electricity supply by asset 25, 0.83 by asset 8 (= 0.83 total)

These are both gasCCGT assets in R2, and although the commission year is different, the parameters for these assets are identical. Therefore, as far as the objective is concerned, the two solutions are equally good. It's probably the case for a lot of optimisations that there's no single optimal solution, rather a space of equally good solutions, but 99.9% of the time highs seems to pick the same solution every time, regardless of OS (I think this always ends up being a "corner" solution, i.e. where one or more parameters in the space is at its limit. The two solutions above are different corner solutions where supply by asset 25 is at its upper and lower limit respectively).

I don't know the factors that cause it to choose one optimum over another, but it seems that tiny floating-point differences between platforms can have an impact (perhaps tilting the energy landscape slightly so that one solution becomes ever-so-slightly favourable over another, or affecting pivot decisions in the optimisation path so it lands on a different vertex).

The best solution I can think of is to modify the objective so that we can guarantee a single optimum solution. One idea would be L2 regularisation, adding a small epsilon-scaled term to the objective to minimise the sum of squares of the variables. Generally this favours smaller values over large "extreme" values, which for our purposes means spreading activity over multiple assets rather than allocating all activity to a single asset - probably not a bad thing to aim for in any case. (In this particular example, the mac solution would be the single unique optimum).

In practice I'm not sure how easy this would be. highs-sys has Highs_passHessian which I believe makes this possible, but this isn't exposed in the highs crate we're using. I believe it's also incompatible with integer variables, which we are using in some situations, but I think we could circumvent this with a two-step approach. This is all based on a conversation with AI so take with a pinch of salt.

Apart from that I don't really have any other ideas...

@tsmbland
Copy link
Copy Markdown
Collaborator Author

tsmbland commented Mar 5, 2026

Annoying that this PR is getting blocked by something completely unrelated! For the time being, do you think it would be ok to turn off this particular regression test until we can get this fixed? What else can we do...?

@tsmbland
Copy link
Copy Markdown
Collaborator Author

tsmbland commented Mar 5, 2026

Or revert this particular example model back to using the ironing out loop and update it later?

@alexdewar
Copy link
Copy Markdown
Member

The difference concerns electricity supply in the night timeslice in 2045 by two assets (8 and 25):

  • on mac: 0.32 electricity supply by asset 25, 0.51 by asset 8 (= 0.83 total)
  • on windows/ubuntu: 0 electricity supply by asset 25, 0.83 by asset 8 (= 0.83 total)

These are both gasCCGT assets in R2, and although the commission year is different, the parameters for these assets are identical. Therefore, as far as the objective is concerned, the two solutions are equally good. It's probably the case for a lot of optimisations that there's no single optimal solution, rather a space of equally good solutions, but 99.9% of the time highs seems to pick the same solution every time, regardless of OS (I think this always ends up being a "corner" solution, i.e. where one or more parameters in the space is at its limit. The two solutions above are different corner solutions where supply by asset 25 is at its upper and lower limit respectively).

I don't know the factors that cause it to choose one optimum over another, but it seems that tiny floating-point differences between platforms can have an impact (perhaps tilting the energy landscape slightly so that one solution becomes ever-so-slightly favourable over another, or affecting pivot decisions in the optimisation path so it lands on a different vertex).

This sounds plausible.

The best solution I can think of is to modify the objective so that we can guarantee a single optimum solution. One idea would be L2 regularisation, adding a small epsilon-scaled term to the objective to minimise the sum of squares of the variables. Generally this favours smaller values over large "extreme" values, which for our purposes means spreading activity over multiple assets rather than allocating all activity to a single asset - probably not a bad thing to aim for in any case. (In this particular example, the mac solution would be the single unique optimum).

In practice I'm not sure how easy this would be. highs-sys has Highs_passHessian which I believe makes this possible, but this isn't exposed in the highs crate we're using. I believe it's also incompatible with integer variables, which we are using in some situations, but I think we could circumvent this with a two-step approach. This is all based on a conversation with AI so take with a pinch of salt.

Apart from that I don't really have any other ideas...

This is an interesting idea, but probably quite a lot of work and, as you say, out of scope for this PR anyway.

As an aside, you can actually call the functions in the lower-level highs-sys crate by getting the pointer from your Model struct and passing that in (it'll have to be in an unsafe block), but if we know we want this functionality it would be cleaner to open a PR for the highs crate adding it (I did that to get the objective value thing added).

Annoying that this PR is getting blocked by something completely unrelated! For the time being, do you think it would be ok to turn off this particular regression test until we can get this fixed? What else can we do...?

How about we just skip this test on non-x86 platforms for now?

This should work:

diff --git a/tests/regression.rs b/tests/regression.rs
index 41c933b9..4307bc3e 100644
--- a/tests/regression.rs
+++ b/tests/regression.rs
@@ -26,6 +26,9 @@ define_regression_test!(circularity);
 // Patched examples
 define_regression_test_with_patches!(simple_divisible);
 define_regression_test_with_patches!(simple_npv);
+
+// We get different results on ARM Macs, for reasons that aren't clear
+#[cfg(target_arch = "x86_64")]
 define_regression_test_with_patches!(simple_ironing_out);
 
 // ------  END: regression tests  ------

It would be better if we opened an issue and linked to it in the comment. It would be nice if the examples gave similar results on all the platforms we care about, but it probably shouldn't be a high priority to fix this. But let's keep track of it anyway!

@tsmbland
Copy link
Copy Markdown
Collaborator Author

tsmbland commented Mar 5, 2026

I'm going to hold this until #1173. That will be another major change to all of the example models, and there's every chance that that will magically fix the issue here. (Or, it could just as likely introduce more issues just like this...)

In any case, this does reveal an underlying issue, so I've opened #1174

Copilot AI review requested due to automatic review settings March 12, 2026 10:32
@tsmbland tsmbland changed the base branch from main to add-regression-tests-for-pricing-strategies March 12, 2026 10:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread schemas/input/model.yaml
@@ -31,7 +31,7 @@ properties:
max_ironing_out_iterations:
type: integer
description: The maximum number of iterations to run the "ironing out" step of agent investment for
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max_ironing_out_iterations schema description sentence is truncated (ends with "for"). Since this PR touches the default here, it’d be good to complete the description so generated docs/help text are clear.

Suggested change
description: The maximum number of iterations to run the "ironing out" step of agent investment for
description: The maximum number of iterations to run the "ironing out" step of agent investment before stopping the process

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually not truncated, but maybe Copilot's wording is clearer...

@tsmbland
Copy link
Copy Markdown
Collaborator Author

@Aurashk @alexdewar Tried rebasing this on top of @Aurashk's PR and it seems to have worked! Sort of...

Tests now passing on ubuntu. One of the tests @Aurashk added in #1185 is failing on Windows, but that's a separate problem that needs to be fixed in #1185. Tests still failing on macos, but that's to do with this this PR as explained above.

So together I think we can fix this!

We'd be sweeping the core problems under the rug a bit, but in the interests of getting things merged I think we have to for now

@Aurashk
Copy link
Copy Markdown
Collaborator

Aurashk commented Mar 12, 2026

  • file_patch_with_replacement_missing_base_fil

Thanks @tsmbland, I think my latest commit should fix the windows issue in that test.

@tsmbland
Copy link
Copy Markdown
Collaborator Author

  • file_patch_with_replacement_missing_base_fil

Thanks @tsmbland, I think my latest commit should fix the windows issue in that test.

Et voila

@tsmbland
Copy link
Copy Markdown
Collaborator Author

@alexdewar In the interest of getting this and @Aurashk's PR merged I've done what you've suggested and disabled the problematic test on macos, and have opened up #1197 to keep an eye on this

Copy link
Copy Markdown
Member

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great that you've managed to fix the failing tests.

Comment thread src/example/patches.rs Outdated
/// A map of file patches, keyed by name
type PatchMap = BTreeMap<&'static str, Vec<FilePatch>>;
/// Map of patches keyed by name, with the file patches and an optional TOML patch
type PatchMap = BTreeMap<&'static str, (Vec<FilePatch>, Option<String>)>;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Option<String> could actually be an Option<&'static str> (where &'static str means string literal) then you wouldn't need to have .to_string()s.

Part of me is wondering whether this would be clearer if you defined a struct rather than using a tuple, but I'm not sure. Up to you.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One day I hope to understand strings in rust!

Comment thread tests/regression.rs
define_regression_test!(circularity);

// For this model we get different results on ARM Macs, for reasons that aren't clear
#[cfg(target_arch = "x86_64")]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be even better if we could mark this as "expected to fail", like you can with pytest, but seemingly there's no built-in way to do that with cargo test. I guess we can just check whether it's miraculously started passing at some point.

@tsmbland
Copy link
Copy Markdown
Collaborator Author

LGTM! Great that you've managed to fix the failing tests.

More like sweep under the rug 😂

@alexdewar
Copy link
Copy Markdown
Member

LGTM! Great that you've managed to fix the failing tests.

More like sweep under the rug 😂

🙈

@tsmbland tsmbland changed the base branch from add-regression-tests-for-pricing-strategies to main March 16, 2026 16:27
Copilot AI review requested due to automatic review settings March 16, 2026 16:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 38 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread tests/regression.rs
Comment on lines +25 to +27
// For this model we get different results on ARM Macs, for reasons that aren't clear
#[cfg(target_arch = "x86_64")]
define_regression_test!(two_regions);
Comment thread src/patch.rs
@@ -167,13 +218,21 @@ impl FilePatch {

/// Apply this patch to a base model and return the modified CSV as a string.
Comment thread schemas/input/model.yaml
@@ -31,7 +31,7 @@ properties:
max_ironing_out_iterations:
type: integer
description: The maximum number of iterations to run the "ironing out" step of agent investment for
@tsmbland tsmbland merged commit 5389d44 into main Mar 16, 2026
6 checks passed
@tsmbland tsmbland deleted the ironing_out_off branch March 16, 2026 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Turn off ironing out loop by default

4 participants