feat(benchmark): add benchmarking pipeline and solver integrations#4
Merged
Conversation
…hmark - Add XFTA Linux binary (xftar) to apps/solvers/xfta/ - Add OpenPSA → S2ML+SBE converter (openpsa_to_s2ml_converter.py) with NOT/XOR incompatible gate detection and top-event extraction - Add S2ML+SBE syntax/semantic verifier (s2ml_verifier.py) with cycle detection and top-event validation - Add SCRAM vs XFTA probability verifier (openpsa_s2ml_TEP_verifier.py) - Rewrite run_benchmark.sh as a 6-step pipeline: OpenPSA validation, S2ML+SBE conversion, S2ML+SBE validation, SCRAM benchmark, XFTA benchmark, and code-to-code probability comparison - Update Dockerfile to copy and install xftar binary - Switch dataset from MHTGR to ARALIA fault-tree dataset in docker-compose - Add fixtures/results/ to .gitignore
…matrix Integrate FTREX execution into the benchmarking stack and expand benchmark coverage across SCRAM and XFTA algorithm variants. Benchmark orchestration updates: - Rework run_benchmark.sh into a matrix runner with dedicated output/script directories. - Add SCRAM runs for BDD, ZBDD REA, and ZBDD MCUB. - Add XFTA runs for BDD, BDT REA/MCUB/PUB, and ZBDD REA/MCUB/PUB. - Add FTREX runs for BDD and ZBDD modes using generated .ftp models. - Add paired comparison experiments (Exp1/Exp2/Exp3) with separate CSV outputs. Container/runtime updates: - Extend benchmarking Dockerfile with Wine and FTREX setup/activation. - Add Dockerfile.ftrex as a focused FTREX runtime image. Verification and converter updates: - Enhance openpsa_s2ml_TEP_verifier.py to compare both probability and MCS counts. - Use streaming XML parsing for SCRAM reports to avoid loading large product sets. - Add mcs cutoff handling and richer CSV/status reporting fields. - Add converters for FTAP<->OpenPSA and FTAP<->S2ML flows. Docs and hygiene: - Expand XFTA MANUAL with detailed sections on BDT/ZBDD cutsets and quantification methods. - Add benchmarking README documenting experiments, outputs, and run instructions. - Update .gitignore to exclude benchmark model/result inputs and FTREX-sensitive assets.
- Added special query style workflow for ZBDD algorithm.
Add end-to-end visualization support for praxis fault-tree and event-tree workflows, including CLI controls, DOT generation, and Graphviz rendering. - add new analysis module apps/solvers/praxis/src/analysis/visualize.rs with: - Graphviz availability detection - DOT generation for fault trees (gates, basic/house events, unresolved refs) - DOT generation for event trees with branch traversal and styled edges - render/save helpers for SVG and PDF output via dot - wire visualization module into analysis exports (analysis.rs) - extend CLI args with visualization flags: - --visualize - --visualize-out-dir <DIR> - --visualize-sequence <SEQ_ID> - --visualize-stdout - integrate visualization execution into CLI flows: - fault-tree pre-event-tree path now emits DOT/stdout and optional SVG files - event-tree analytic and monte-carlo paths now generate ET diagrams with Graphviz fallback warnings - update technical overview to document fixtures directory in repo structure - update local Claude permissions to allow visualization test command invocations
Replace the path-accumulator based stats traversal with a memoized bottom-up merge keyed by ZbddRef.
What changed:
- stats_by_order now initializes a node cache and delegates to stats_by_order_rec.
- Removed stats_rec(order, p_acc, stats) accumulator-style traversal.
- Added stats_by_order_rec(f, cache) -> HashMap<order, (count, min_p, max_p)>.
- Terminal/base cases are now explicit map returns:
- EMPTY => {}.
- BASE => {0: (1, 1.0, 1.0)}.
- Non-terminal merge logic now combines low/high subtree maps and shifts high orders by +1 while scaling min/max probabilities by p(var).
- Results are memoized per node and cloned from cache on reuse.
Why:
- Avoids recomputing subtree statistics when the same ZBDD node is reached from multiple paths.
- Brings stats_by_order in line with other memoized routines in ZbddEngine.
- Simplifies reasoning by expressing order statistics as compositional node results instead of mutable path state.
Validation:
- Ran cargo test --lib zbdd_engine::tests::test_rare_event_or_gate (pass).
Add end-to-end benchmarking support for the new solver matrix. - extend the Docker image with the toolchains and runtime dependencies needed for PRAXIS, ZEBRA, and SAPHSOLVE - add XML-to-JSInp conversion, ZEBRA log parsing, and JSCut parsing helpers - add SCRAM comparison verifiers for PRAXIS, ZEBRA, and SAPHSOLVE outputs - generate an interactive Plotly HTML report summarizing timing, probabilities, and cut-set counts - update the benchmark runner to execute the expanded experiment set and write the new comparison artifacts - refresh the benchmark README and ignore rules to match the new workflow and generated outputs
Prevent local validation from leaving tracked cache files behind. - ignore Python bytecode and __pycache__ directories globally - remove the accidentally committed bytecode generated during Python syntax checks This keeps the benchmark changes focused on the actual source updates and avoids reintroducing compiled artifacts in future runs.
Extend the benchmarking pipeline to include FTREX alongside the existing solver comparisons. - add FTREX log parsers for exact, P_SUM, P_MCUB, and cut-set counts - add a SCRAM vs FTREX verifier that emits comparison CSVs for probability and MCS data - wire FTREX runs into the benchmarking script and add the new experiment output paths - update the interactive benchmark report so FTREX results appear in timing, probability, and MCS summaries - install the additional Docker dependencies needed to run the FTREX workflow in the benchmark image - update .gitignore with the new fixture and Python cache exclusions
- Replace the old one-script converter/verifier layout with a shared model layer and dedicated readers/writers for OpenPSA XML, FTAP, S2ML, and JSINP. - Add a unified convert entry point plus new format-specific parsers/verifiers for SCRAM, XFTA, PRAXIS, FTREX, ZEBRA, and SAPHSOLVE outputs. - Rework the benchmark HTML report to support multiple datasets and solver tabs, and update the orchestration script to use the new conversion flow and longer solver timeouts. - Extend solver model typing to support gates, richer probability representations, and gate references; add benchmarking docs, notes, fixture outputs, and ignore generated report artifacts.
Add a manual GitHub Actions benchmark workflow that pulls the prebuilt c2c-benchmark image from the private registry, mounts the Aralia fixtures and benchmark scripts at runtime, and uploads the generated results as artifacts. Expand the benchmark report generator to group timing bars by solver family, display model-level gate/basic-event statistics, and load timing metadata from the new dataset layout. Document the benchmark workflow plan, registry assumptions, and rollout steps in docs/benchmark-workflow-plan.md.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes