This repository provides a complete pipeline for numerical homogenization of 3D Discrete Fracture Matrix (DFM) models, training convolutional neural network (CNN)-based surrogates, and integrating these surrogates into a Multilevel Monte Carlo (MLMC) framework for efficient uncertainty quantification.
This repository consists of three components:
-
Dataset Generation
- Performs numerical homogenization
- Rasterizes bulk and fracture data
- Creates a Zarr-formatted dataset
-
Surrogate Training
- Preprocesses the dataset
- Trains a 3D CNN to predict an equivalent hydraulic conductivity tensor from rasterized inputs
-
Surrogate Postprocessing
- Applies trained models for prediction on a given dataset
- Provides visualization tools and evaluation scripts
-
MLMC Run with Surrogates
- Integrates trained CNN surrogates into Multilevel Monte Carlo (MLMC) simulations.
- Enables efficient upscaling of hydraulic conductivity across MLMC levels.
-
MLMC Postprocessing
- Uses the mlmc library for analysis of simulation results.
- Provides tools for estimating the mean and variance of derived quantities, along with diagnostic plots and statistical summaries.
Each part can be run independently, using the provided data.
- Developed and tested using Python 3.8 and Python 3.10
- Each pipeline component has its own dependency file:
| Component | Requirements File |
|---|---|
| Dataset Generation | requirements_data_generation.txt |
| Surrogate Training | requirements_training.txt |
| Surrogate Postprocessing | requirements_postprocess.txt |
| MLMC run with Surrogates | requirements_data_generation.txt |
| MLMC postprocessing | requirements_mlmc_postprocess.txt |
We recommend creating separate virtual environments for each part, depending on your compute environment.
cd MLMC-DFM
export PYTHONPATH=.Prerequisite: Ensure that both Flow123d and GMSH are installed and accessible from the command line.
To generate datasets as we did for our experiments (numerical homogenization, rasterization, Zarr formatting), run
python gen_train_samples.py work_dir scratch_dirwork_dir: Working directory (e.g.test/01_cond_field- has to contain simulation config - similar totest/01_cond_field/sim_config_3D_homogenization_samples.yamlscratch_dir: Fast scratch directory (set to""- if not applicable or available)
The paths to Flow123d and GMSH executables are configured inside the
set_environment_variables()method ingen_train_samples.py.
To train the surrogate run:
python metamodel/cnn3D/models/train_cnn_optuna_3d.py configuration data_dir results_dir -cconfiguration(e.g.configs/cnn_3D/final_test_config.yaml)data_dir: Path to the dataset (Zarr format - e.g.,data/samples_data_to_test.zarr- small dataset (~1,000 samples))results_dir: Where results and logs will be saved-c: Use GPU (CUDA or AMD ROCm) if available
For full-scale training on 60,000 samples (22GB+), see:
-
Config file:
configs/cnn_3D/configs_lumi/final_config.yaml -
Slurm submission script for HPC/GPU training (e.g., on LUMI):
slurm_submit_gpu_sing_small.sh
The training dataset is large and can be provided upon reasonable request.
Trained surrogates can be used to make predictions on new datasets or analyze their performance:
python metamodel/cnn3D/postprocess/optuna_results.py results_dir data_dirresults_dir: Directory containing trained model (e.g.,optuna_runs/3D_cnn/lumi/cond_frac_1_3/trained_surrogate)data_dir: Path to the evaluation dataset (in Zarr format - e.g.,data/samples_data_to_test.zarr)
We provide compressed trained surrogates for fracture-to-matrix hydraulic conductivity ratios Kₓ/Kₘ ∈ {10³, 10⁵, 10⁷}:
optuna_runs/3D_cnn/lumi/cond_frac_1_3/trained_surrogate.zipforKₓ/Kₘ = 10³optuna_runs/3D_cnn/lumi/cond_frac_1_5/trained_surrogate.zipforKₓ/Kₘ = 10⁵optuna_runs/3D_cnn/lumi/cond_frac_1_7/trained_surrogate.zipforKₓ/Kₘ = 10⁷
Note: Due to limited consecutive training time on our devices, the model was trained in multiple sessions by resuming from saved checkpoints to reach the desired number of epochs. As a result, the training metrics (e.g., loss curves) may not represent the complete history over all epochs for the presented trained surrogates.
Prerequisites:
Ensure that both Flow123d and Gmsh are installed and accessible from the command line.
Running the MLMC (Multilevel Monte Carlo) simulation with a surrogate model is very similar to generating datasets.
The main difference lies in the configuration file — you must set the following parameters:
generate_hom_samples: false- Provide the surrogate model path using
nn_pathornn_path_cond_frac.
For an example configuration file, see:
test/01_cond_field/sim_config_3D_MC_samples.yaml.
The path to the simulation configuration file is currently set in the setup_config() method inside mlmc_dfm_3d.py.
Use the following command to start the MLMC simulation:
python mlmc_dfm_3d.py run work_dir scratch_dirwork_dir: Path to the working directory (e.g.test/01_cond_field). This directory must contain a valid simulation configuration file, such astest/01_cond_field/sim_config_3D_MC_samples.yaml.scratch_dir: Path to a fast scratch directory for temporary files. Use an empty string "" if not applicable.
The paths to Flow123d and GMSH executables are configured inside the
set_environment_variables()method inmlmc_dfm_3d.py.
Once the MLMC simulation has completed, you can run the postprocessing step to analyze and visualize the results.
python postprocess_mlmc_dfm_3d.py work_dirwork_dir: Path to the working directory (e.g.test/01_cond_field). This directory must contain the output filemlmc_l.hdf5file, wherelcorresponds to the number of levels defined by the parameterself.n_levelsset inmlmc_dfm_3d.py.