Add prepare function for condition-level object preparation #78

mronkko · 2025-12-13T18:19:36Z

Summary

Implements #74 by adding a prepare parameter to runSimulation() that executes once per simulation condition to prepare/modify fixed_objects before replications run.

Implementation

The prepare function:

Accepts condition and fixed_objects as arguments
Returns the modified fixed_objects
Is called in Analysis() function (before replications, alongside where summarise is called after replications)
Executes once per design row (condition)
Modified fixed_objects is passed to all replications for that condition

Use Case

Pre-compute expensive condition-specific objects (design matrices, correlation matrices, lookup tables) once per condition instead of per replication. This avoids both:

Memory issues from pre-computing objects for all conditions upfront
Performance issues from recomputing expensive objects for every replication

Example Usage

Design <- createDesign(N = c(100, 1000, 10000))

prepare <- function(condition, fixed_objects) {
    # Pre-compute expensive condition-specific objects
    fixed_objects$design_matrix <- matrix(rnorm(condition$N * 10), ncol=10)
    fixed_objects$lookup_table <- compute_expensive_lookup(condition$N)
    return(fixed_objects)
}

generate <- function(condition, fixed_objects) {
    # Use prepared objects from fixed_objects
    X <- fixed_objects$design_matrix
    y <- X %*% rnorm(10) + rnorm(nrow(X))
    data.frame(y=y, X)
}

runSimulation(Design, replications=1000,
              prepare=prepare,
              generate=generate,
              analyse=analyse,
              summarise=summarise)

Changes

Added prepare parameter to runSimulation() function signature
Added comprehensive documentation for the prepare parameter
Added validation for prepare function signature (must include condition, fixed_objects)
Added prepare parameter to Analysis() function
Implemented prepare call in Analysis() (once per condition, before replication loop)
Added parallel cluster export for prepare when provided
Added prepare globals to check.globals functionality
Full backward compatibility (defaults to NULL)

Testing

✅ Prepare function correctly modifies fixed_objects per condition
✅ Modified objects available in all user functions
✅ Backward compatibility maintained (existing code works unchanged)
✅ Proper error handling when prepare fails

Configure pbapply to display text-based progress bars when running in non-interactive mode (e.g., batch jobs, Rscript). Previously, progress tracking was disabled in non-interactive sessions, making it impossible to monitor simulation progress in SLURM log files. Changes: - R/analysis.R: Add pboptions configuration to force type="txt" in non-interactive mode while preserving timer bars in interactive sessions - R/runSimulation.R: Update progress parameter documentation to describe the new behavior Interactive users see no change. Non-interactive users (SLURM, batch jobs) now see text progress bars when monitoring logs via tail -f. Fixes philchalmers#75

Implements philchalmers#74 by adding a prepare parameter to runSimulation() that modifies fixed_objects once per condition before replications run. The prepare function accepts condition and fixed_objects as arguments and returns the modified fixed_objects, which is then passed to all replications for that condition. Use case: Pre-compute expensive condition-specific objects (design matrices, lookup tables) once per condition instead of per replication, avoiding both memory issues (from pre-computing all conditions) and performance issues (from recomputing per replication). Implementation: - Added prepare parameter with validation - Calls prepare(condition, fixed_objects) in main loop per condition - Returns modified fixed_objects for use in replications - Exports prepare to parallel clusters when provided - Includes prepare globals in check.globals - Full backward compatibility (prepare defaults to NULL) Example: prepare <- function(condition, fixed_objects) { fixed_objects$design_matrix <- matrix(rnorm(condition$N * 10), ncol=10) return(fixed_objects) }

philchalmers · 2025-12-14T01:24:44Z

I generally like this structure now, thanks. The use case is fine, but maybe not the best way to think about how to use this in the documentation.

The way that I see this being useful is if within the prepare() definition something like fixed_objects$expensive_stuff <- readRDS('prepare/expensive_stuff') were used. The main reason is that information inside of prepare() is not returned by runSimulation() as the objects are expected to eat a good amount of RAM, so you wouldn't want these stored. Moreover, you'd certainty want to know what the information in prepare() actually look like given that they are a key component of the experiment (hence, we should note that any use of random number generation will be lost with this approach, and therefore saving RDS objects beforehand would be a more reasonable strategy).

As and aside, I particularly like this readRDS() idea in situations where binary files are precompiled locally and distributed on the cluster, as that should be considered a set once and forget it part of the codebase.

This commit adds comprehensive random number generator (RNG) state management for the prepare() function, ensuring reproducibility and debugging support consistent with generate/analyse/summarise functions. Key Changes: 1. Seed Capture (R/analysis.R:15-52) - Automatically capture .Random.seed state before prepare() executes - Initialize RNG if .Random.seed doesn't exist yet - Store prepare error seed when prepare() fails for debugging 2. Seed Storage (R/analysis.R:26-37, 251-261) - Save prepare seeds to disk when save_seeds=TRUE - File path format: design-row-{ID}/prepare-seed - Store prepare_Random.seed in attributes when store_Random.seeds=TRUE - Always store prepare_error_seed for debugging (independent of flag) 3. New Parameter: load_seed_prepare (R/runSimulation.R:1033) - Dedicated parameter for debugging prepare function - Accepts character path, integer vector, or tibble/data.frame - Supports both absolute and relative file paths - Automatically detects path type and handles appropriately - Documented at R/runSimulation.R:345-352 4. Seed Extraction (R/SimExtract.R:120-123, 199-209) - SimExtract(res, 'prepare_seeds') - extract all prepare seeds - SimExtract(res, 'prepare_error_seed') - extract error seeds 5. Attribute Preservation (R/runSimulation.R:1635-1636) - Manually restore prepare seed attributes when Result_list is rebuilt as data.frame to prevent attribute loss Example Usage: # Run simulation with prepare that uses RNG res <- runSimulation(Design, replications=10, prepare=prepare, # Uses rnorm(), runif(), etc. control=list(save_seeds=TRUE, store_Random.seeds=TRUE)) # Extract prepare seeds for reproducibility prepare_seeds <- SimExtract(res, 'prepare_seeds') # Debug prepare errors by loading the error seed res2 <- runSimulation(Design[2,], replications=1, load_seed_prepare='design-row-2/prepare-seed') Design Decisions: - prepare_Random.seed only stored when store_Random.seeds=TRUE for consistency with stored_Random.seeds behavior - prepare_error_seed always stored for debugging, like error_seeds and warning_seeds - Separate attributes (prepare_Random.seed, prepare_error_seed) instead of nested list for consistency with existing codebase patterns - File path detection allows both absolute and relative paths Related: Complements PR philchalmers#78 (prepare function feature)

mronkko · 2025-12-14T08:38:12Z

I implemented seed storing for prepare in the pull request.

Pre-generating and loading the prepared objects is a solution, but it is not always an ideal approach:

The pre-generation can be costly and thus better run on a cluster instead of a local computer. The files can also be large, making storage and transfer cumbersome.
For reproducibility by others (i.e. how easy the code is to run and undesrstand), it might be better to have one simulation file that does all preparation in one function instead of two functions for calculating and loading precalcultated results.

This change allows both use cases, 1) prepare as a loader and 2) prepare as a data generator shared with all replications.

philchalmers · 2025-12-14T15:53:45Z

I implemented seed storing for prepare in the pull request.

Pre-generating and loading the prepared objects is a solution, but it is not always an ideal approach:

The pre-generation can be costly and thus better run on a cluster instead of a local computer. The files can also be large, making storage and transfer cumbersome.

True, but this should be considered the exception rather than the rule. I was referring to highlighting this in the documentation as the object generation within prepare() is unnecessary for a wide majority of simulations. Moreover the prepare() step is run on a single core on the cluster, while all the prepare() functions across the design could easily be run in parallel locally or, for instance, on a SLURM landing node, and stored as individual and tractable objects. If the objects themselves are large I don't see why uploads to the cluster are going to be an issue, unless for some reason bandwidth is the issue. Of course, if the objects are so large that the they can only be stored temporarily on the distributed arrays then you're forced to used this approach, in which case tracking what the actual generated objects were at a later time will be a time consuming nightmare.....

For reproducibility by others (i.e. how easy the code is to run and undesrstand), it might be better to have one simulation file that does all preparation in one function instead of two functions for calculating and loading precalcultated results.

The two step can be performed using the usual source() approach early in the object preparation stage on or off the landing node, while prepare() does the ladder. I don't see the need to split more than is already available.

This change allows both use cases, 1) prepare as a loader and 2) prepare as a data generator shared with all replications.

Great, I think this is coming together. Could you update the NEWS.md file to reflect the two pulls, and switch your ctb status to aut in DESCRIPTION? A few tests should probably be added to the tests/ directory as well just to make sure this works consistently in future releases.

philchalmers · 2025-12-14T16:09:18Z

R/analysis.R

+            .GlobalEnv$.Random.seed <- load_seed_prepare
+
+        # Ensure .Random.seed exists (initialize RNG if needed)
+        else if(!exists(".Random.seed", envir = .GlobalEnv))


This should be moved to .on.Attach() as it affects the other .Random.seed instances too

I pushed the change that moves RNG initialization to .onAttach()

@param

- Rewrite @param prepare to prioritize loading RDS files over dynamic generation - Add RNG reproducibility warning when generating within prepare() - Note that prepare objects are not stored by runSimulation() - Add complete working example demonstrating recommended two-step workflow - Document prepare seed storage in save_seeds and store_Random.seeds sections Changes address feedback from PR philchalmers#78 to position prepare() primarily as an object loader for cluster workflows, with dynamic generation as a secondary use case requiring explicit RNG state management.

This reverts commit e84755d.

…ion and improved logging on cluster.

… role.

mronkko · 2025-12-15T08:45:09Z

I added tests, updated documentation, added new release to NEWS and fixed my details in DESCRIPTION.

mronkko added 2 commits December 13, 2025 08:36

philchalmers reviewed Dec 14, 2025

View reviewed changes

mronkko added 7 commits December 15, 2025 09:33

Moved RNG initialization to .onAttach()

ff4868f

Added tests for the prepare function.

98eec83

Added logging changes and prepare function in a new release.

e84755d

Revert "Added logging changes and prepare function in a new release."

5f4b945

This reverts commit e84755d.

Incremented release number and added description of the prepare funct…

f8c7277

…ion and improved logging on cluster.

Fixed Mikko Rönkkö's last name to have umlauts and changed ctb -> aut…

3dfdccd

… role.

philchalmers approved these changes Dec 15, 2025

View reviewed changes

philchalmers merged commit 962eb77 into philchalmers:main Dec 15, 2025
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add prepare function for condition-level object preparation #78

Add prepare function for condition-level object preparation #78

Uh oh!

mronkko commented Dec 13, 2025

Uh oh!

philchalmers commented Dec 14, 2025

Uh oh!

mronkko commented Dec 14, 2025

Uh oh!

philchalmers commented Dec 14, 2025 •

edited

Loading

Uh oh!

philchalmers Dec 14, 2025

Uh oh!

mronkko Dec 15, 2025

Uh oh!

mronkko commented Dec 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add prepare function for condition-level object preparation #78

Add prepare function for condition-level object preparation #78

Uh oh!

Conversation

mronkko commented Dec 13, 2025

Summary

Implementation

Use Case

Example Usage

Changes

Testing

Uh oh!

philchalmers commented Dec 14, 2025

Uh oh!

mronkko commented Dec 14, 2025

Uh oh!

philchalmers commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philchalmers Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

mronkko Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

mronkko commented Dec 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

philchalmers commented Dec 14, 2025 •

edited

Loading