Skip to content

Add benchmark validation and timeout guards#22

Merged
bernalde merged 3 commits into
pr/benchmarksfrom
feature/benchmark-validation-timeouts
May 10, 2026
Merged

Add benchmark validation and timeout guards#22
bernalde merged 3 commits into
pr/benchmarksfrom
feature/benchmark-validation-timeouts

Conversation

@bernalde
Copy link
Copy Markdown
Member

@bernalde bernalde commented May 10, 2026

Summary

  • add post-solve benchmark metrics for final dryness shortfall, product-temperature violation, and optional Tsh/Pch ramp-rate violations
  • mark benchmark records failed when those post-solve metrics violate configured constraints
  • add benchmark CLI solver guards for IPOPT max_cpu_time and max_wall_time, propagate them through Pyomo adapters, and record configured timeout metadata
  • document the benchmark CLI and JSONL schema changes
  • filter known expected edge-case test warnings narrowly so the CI-equivalent test run is warning-clean

Closes #15
Closes #16

Tests run

  • python -m compileall benchmarks lyopronto/pyomo_models tests/test_benchmarks.py - passed
  • PYTHONPATH=. pytest tests/test_benchmarks.py -q - 12 passed
  • PYTHONPATH=. pytest tests/ -m "not notebook and not pyomo" -q - 254 passed in 417.56s, no warning summary
  • Warning triage run: PYTHONPATH=. pytest tests/ -m "not notebook and not pyomo" -q -o addopts='-n auto --tb=short --strict-markers --maxfail=1 --dist loadgroup' -W default - 254 passed, confirmed the warnings were expected legacy edge-case warnings

Notes

  • pytest tests/test_benchmarks.py -q was also tried first, but this branch's pytest configuration does not put the repository root on PYTHONPATH; collection failed with ModuleNotFoundError: No module named 'benchmarks'. The tests were rerun with PYTHONPATH=..
  • Follow-up issue for the broader legacy warning semantics: Audit legacy edge-case warning semantics #23.

Copy link
Copy Markdown
Member Author

@bernalde bernalde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found blocking issues in the benchmark validation behavior. GitHub does not allow this account to request changes on its own PR, but I would treat this as a request-changes review. The local targeted and CI-equivalent non-notebook/non-Pyomo tests pass, but I would not merge this until the blocking issues above are addressed.

Comment thread benchmarks/grid_cli.py
Comment thread benchmarks/validate.py Outdated
Comment thread pytest.ini
@bernalde
Copy link
Copy Markdown
Member Author

Pushed commit 53e18d2 (Address benchmark validation review feedback (#15, #16)).

Main changes made:

  • Split benchmark metric kwargs so SciPy records keep product-temperature/dryness validation only, while Pyomo records receive the Pyomo-only ramp validation limits.
  • Raised post-solve ramp validation tolerance to 1e-6 to align with solver feasibility tolerance and avoid false failures from accepted IPOPT numerical noise.
  • Added focused benchmark tests for SciPy ramp-limit isolation and ramp noise within tolerance.

Tests run and results:

  • PYTHONPATH=. pytest tests/test_benchmarks.py -q -> 14 passed.
  • python -m compileall benchmarks lyopronto/pyomo_models tests/test_benchmarks.py -> passed.
  • PYTHONPATH=. pytest tests/ -n auto -v -m "not notebook and not pyomo" --ignore=tests/test_pyomo_models -> 251 passed in 375.92s.

Comments intentionally not addressed:

  • The nonblocking pytest.ini warning-filter concern remains intentionally unchanged in this PR. It is broader than benchmark validation/timeouts and is already tracked by Audit legacy edge-case warning semantics #23, including the follow-up to replace broad filters with local warning assertions where practical.

Remaining risks or follow-up items:

@bernalde bernalde merged commit 75fd50d into pr/benchmarks May 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant