-
Notifications
You must be signed in to change notification settings - Fork 22
Add prepare function for condition-level object preparation #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add prepare function for condition-level object preparation #78
Conversation
Configure pbapply to display text-based progress bars when running in non-interactive mode (e.g., batch jobs, Rscript). Previously, progress tracking was disabled in non-interactive sessions, making it impossible to monitor simulation progress in SLURM log files. Changes: - R/analysis.R: Add pboptions configuration to force type="txt" in non-interactive mode while preserving timer bars in interactive sessions - R/runSimulation.R: Update progress parameter documentation to describe the new behavior Interactive users see no change. Non-interactive users (SLURM, batch jobs) now see text progress bars when monitoring logs via tail -f. Fixes philchalmers#75
Implements philchalmers#74 by adding a prepare parameter to runSimulation() that modifies fixed_objects once per condition before replications run. The prepare function accepts condition and fixed_objects as arguments and returns the modified fixed_objects, which is then passed to all replications for that condition. Use case: Pre-compute expensive condition-specific objects (design matrices, lookup tables) once per condition instead of per replication, avoiding both memory issues (from pre-computing all conditions) and performance issues (from recomputing per replication). Implementation: - Added prepare parameter with validation - Calls prepare(condition, fixed_objects) in main loop per condition - Returns modified fixed_objects for use in replications - Exports prepare to parallel clusters when provided - Includes prepare globals in check.globals - Full backward compatibility (prepare defaults to NULL) Example: prepare <- function(condition, fixed_objects) { fixed_objects$design_matrix <- matrix(rnorm(condition$N * 10), ncol=10) return(fixed_objects) }
|
I generally like this structure now, thanks. The use case is fine, but maybe not the best way to think about how to use this in the documentation. The way that I see this being useful is if within the As and aside, I particularly like this readRDS() idea in situations where binary files are precompiled locally and distributed on the cluster, as that should be considered a set once and forget it part of the codebase. |
This commit adds comprehensive random number generator (RNG) state
management for the prepare() function, ensuring reproducibility and
debugging support consistent with generate/analyse/summarise functions.
Key Changes:
1. Seed Capture (R/analysis.R:15-52)
- Automatically capture .Random.seed state before prepare() executes
- Initialize RNG if .Random.seed doesn't exist yet
- Store prepare error seed when prepare() fails for debugging
2. Seed Storage (R/analysis.R:26-37, 251-261)
- Save prepare seeds to disk when save_seeds=TRUE
- File path format: design-row-{ID}/prepare-seed
- Store prepare_Random.seed in attributes when store_Random.seeds=TRUE
- Always store prepare_error_seed for debugging (independent of flag)
3. New Parameter: load_seed_prepare (R/runSimulation.R:1033)
- Dedicated parameter for debugging prepare function
- Accepts character path, integer vector, or tibble/data.frame
- Supports both absolute and relative file paths
- Automatically detects path type and handles appropriately
- Documented at R/runSimulation.R:345-352
4. Seed Extraction (R/SimExtract.R:120-123, 199-209)
- SimExtract(res, 'prepare_seeds') - extract all prepare seeds
- SimExtract(res, 'prepare_error_seed') - extract error seeds
5. Attribute Preservation (R/runSimulation.R:1635-1636)
- Manually restore prepare seed attributes when Result_list is
rebuilt as data.frame to prevent attribute loss
Example Usage:
# Run simulation with prepare that uses RNG
res <- runSimulation(Design, replications=10,
prepare=prepare, # Uses rnorm(), runif(), etc.
control=list(save_seeds=TRUE,
store_Random.seeds=TRUE))
# Extract prepare seeds for reproducibility
prepare_seeds <- SimExtract(res, 'prepare_seeds')
# Debug prepare errors by loading the error seed
res2 <- runSimulation(Design[2,], replications=1,
load_seed_prepare='design-row-2/prepare-seed')
Design Decisions:
- prepare_Random.seed only stored when store_Random.seeds=TRUE for
consistency with stored_Random.seeds behavior
- prepare_error_seed always stored for debugging, like error_seeds
and warning_seeds
- Separate attributes (prepare_Random.seed, prepare_error_seed)
instead of nested list for consistency with existing codebase patterns
- File path detection allows both absolute and relative paths
Related: Complements PR philchalmers#78 (prepare function feature)
|
I implemented seed storing for prepare in the pull request. Pre-generating and loading the prepared objects is a solution, but it is not always an ideal approach:
This change allows both use cases, 1) prepare as a loader and 2) prepare as a data generator shared with all replications. |
True, but this should be considered the exception rather than the rule. I was referring to highlighting this in the documentation as the object generation within
The two step can be performed using the usual
Great, I think this is coming together. Could you update the NEWS.md file to reflect the two pulls, and switch your |
R/analysis.R
Outdated
| .GlobalEnv$.Random.seed <- load_seed_prepare | ||
|
|
||
| # Ensure .Random.seed exists (initialize RNG if needed) | ||
| else if(!exists(".Random.seed", envir = .GlobalEnv)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved to .on.Attach() as it affects the other .Random.seed instances too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed the change that moves RNG initialization to .onAttach()
- Rewrite @param prepare to prioritize loading RDS files over dynamic generation - Add RNG reproducibility warning when generating within prepare() - Note that prepare objects are not stored by runSimulation() - Add complete working example demonstrating recommended two-step workflow - Document prepare seed storage in save_seeds and store_Random.seeds sections Changes address feedback from PR philchalmers#78 to position prepare() primarily as an object loader for cluster workflows, with dynamic generation as a secondary use case requiring explicit RNG state management.
This reverts commit e84755d.
…ion and improved logging on cluster.
|
I added tests, updated documentation, added new release to NEWS and fixed my details in DESCRIPTION. |
Summary
Implements #74 by adding a
prepareparameter torunSimulation()that executes once per simulation condition to prepare/modifyfixed_objectsbefore replications run.Implementation
The
preparefunction:conditionandfixed_objectsas argumentsfixed_objectsAnalysis()function (before replications, alongside wheresummariseis called after replications)fixed_objectsis passed to all replications for that conditionUse Case
Pre-compute expensive condition-specific objects (design matrices, correlation matrices, lookup tables) once per condition instead of per replication. This avoids both:
Example Usage
Changes
prepareparameter torunSimulation()function signatureprepareparameter toAnalysis()functionAnalysis()(once per condition, before replication loop)Testing
✅ Prepare function correctly modifies fixed_objects per condition
✅ Modified objects available in all user functions
✅ Backward compatibility maintained (existing code works unchanged)
✅ Proper error handling when prepare fails