Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
125 commits
Select commit Hold shift + click to select a range
2ad6db2
Import torch packages to handle distributed computing in models
yoshikisd Feb 23, 2025
c393f47
Duplicated fancy ptycho example for multi gpu demo with DistributedDa…
yoshikisd Feb 23, 2025
fc40b77
Reworked fancy ptycho example for multi GPU testing with DDP
yoshikisd Feb 23, 2025
e8ed010
Added a name-main block to the fancy ptycho multi gpu example
yoshikisd Feb 23, 2025
94dcc3f
Fixed a double-period typo
yoshikisd Feb 23, 2025
ad3d816
Added basic multi GPU support for AD_optimize and Adam_optimize in CD…
yoshikisd Feb 23, 2025
efe2ac1
Fixes for fancy_ptycho_multi_gpu_ddp.py
yoshikisd Feb 23, 2025
1eb8704
Created fancy_ptycho_multi_gpu_ddp_speed_test.py to compare reconstru…
yoshikisd Feb 24, 2025
7c76626
Moved dataset import outside of function
yoshikisd Feb 27, 2025
9831b19
Disabled NCCL peer-2-peer communication in fancy_ptycho_multi_gpu_ddp.py
yoshikisd Feb 27, 2025
057233c
Disabled NCCL P2P for the multi gpu speed test
yoshikisd Feb 27, 2025
d6a053b
Merge branch 'cdtools-developers:master' into feature/multi-gpu
yoshikisd Feb 27, 2025
902d272
Created new module cdtools.tools.distributed for multi-GPU applications
yoshikisd Mar 1, 2025
aedc035
Spawning handled by cdtools.tools.distributed module in multi-gpu exa…
yoshikisd Mar 1, 2025
4521f35
Created a wrapper for managing process groups in reconstruction scripts
yoshikisd Mar 2, 2025
00ee707
Removed process group managing functions from fancy_ptycho_multi_gpu_…
yoshikisd Mar 2, 2025
db01ac8
Changed type hint for multi_gpu_reconstruct from array to ndarray
yoshikisd Mar 2, 2025
5be1770
Fixed dumb typo for ndarray
yoshikisd Mar 2, 2025
ad3c1e3
Device loading of the model and dataset is now handled by process_man…
yoshikisd Mar 2, 2025
c9165ea
Refactor: process_manager renamed to reconstructor_wrapper. reconstru…
yoshikisd Mar 2, 2025
c965001
DDP is now handled by distributed_wrapper. reconstructor_wrapper was …
yoshikisd Mar 3, 2025
dafddf7
Added description to cdtools.tools.distributed.distributed
yoshikisd Mar 3, 2025
5d6739a
Changed type annotation for multi_gpu_reconstruct
yoshikisd Mar 3, 2025
a5c3095
CDIModel methods can just use model._ when using cdtools.tools.distri…
yoshikisd Mar 8, 2025
553f70d
Removed DistributedDataParallel from CDIModel
yoshikisd Mar 8, 2025
c74aaf4
Models calculated on multiple GPUs automatically perform plotting met…
yoshikisd Mar 8, 2025
7278a37
model.Adam_optimize no longer need rank or world_size parameters for …
yoshikisd Mar 8, 2025
72b4d6a
Edit file name path for fancy_ptycho_multi_gpu_ddp.py
yoshikisd Mar 8, 2025
0ac1121
Added type hints to distributed.py
yoshikisd Mar 8, 2025
57dca98
cdtools.tools.distributed.distributed methods can take Connection obj…
yoshikisd Mar 8, 2025
e6e8130
Updated the multi-gpu speed test
yoshikisd Mar 8, 2025
7514ca6
Updated function description and removed os dependency for the multi-…
yoshikisd Mar 8, 2025
6e4145f
Fixed dataset path for multi gpu speed test
yoshikisd Mar 8, 2025
436696f
Moved data_loader.sampler.set_epoch() to the inside of run_epoch stat…
yoshikisd Mar 8, 2025
4783915
Removed scheduler from the multi gpu speed test script
yoshikisd Mar 10, 2025
77eb21a
Cleaned up the multi gpu speed test
yoshikisd Mar 10, 2025
3e1bb17
Fixed the path in the multi GPU speed test (again)
yoshikisd Mar 10, 2025
5b11dc5
Cleaned up fancy_ptycho_multi_gpu_ddp.py
yoshikisd Mar 11, 2025
00e0413
Removed barrier statement from CDIModel
yoshikisd Mar 11, 2025
dd07c4b
Fix for unintended GPU usage; added ability to define which GPUs to u…
yoshikisd Mar 12, 2025
464ed5a
Fixed discrepancy between user-specified batch_size and effective bat…
yoshikisd Mar 13, 2025
8f1ee06
Fixed list type hint error for Python 3.8
yoshikisd Mar 13, 2025
a82e9fc
Merge branch 'cdtools-developers:master' into feature/multi-gpu
yoshikisd Apr 13, 2025
c29d28c
Switched DDP with all_reduce implementation for distributive computing
yoshikisd Apr 28, 2025
10e4d74
Changed name of multi-gpu speed test. Added gold balls example to the…
yoshikisd Apr 28, 2025
7763520
Created Reconstructor class to enable separation of optimization loop…
yoshikisd May 9, 2025
c610e87
Added comment to adam.py to highlight similarities to CDIModel.AD_opt…
yoshikisd May 9, 2025
3650f3b
Created a script to compare performance of the old and new methods fo…
yoshikisd May 10, 2025
30d17b3
MultiGPU fancy_ptycho example now works with the Reconstructor class
yoshikisd May 10, 2025
30c1f77
Distributed speed test now works with the Reconstructor class
yoshikisd May 10, 2025
854c907
Cleaned up the descriptions and removed unneccessary attributes for t…
yoshikisd May 12, 2025
64f986f
Scheduler is now defined in __init__
yoshikisd May 12, 2025
1b7d094
Moved scheduler back to optimize. Moved setup_dataloader to the Recon…
yoshikisd May 12, 2025
140c65d
Added type annotations to Reconstructor.optimize
yoshikisd May 12, 2025
f690237
Merge branch 'cdtools-developers:master' into feature/multi-gpu
yoshikisd May 28, 2025
81f9f5f
Renamed the Reconstructor class to the Optimizer class
yoshikisd Jun 10, 2025
60737e6
Refactor Optimizer to separate run_epoch from optimize. Also created …
yoshikisd Jun 10, 2025
cfa86b1
Refactor Optimizer to separate run_epoch from optimize. Also created …
yoshikisd Jun 10, 2025
81e5cd9
Removed DDP dependency from distributed.py. Tidied up the docs.
yoshikisd Jun 10, 2025
dfb9d14
CDIModel.Adam_optimize refactored to only use cdtools.optimizer.Adam
yoshikisd Jun 10, 2025
2b8d1ca
Merge branch 'cdtools-developers:master' into feature/multi-gpu
yoshikisd Jun 10, 2025
ac3373c
Revert "Renamed the Reconstructor class to the Optimizer class"
yoshikisd Jun 10, 2025
7fb2b96
Updated __init__.py in optimizer
yoshikisd Jun 10, 2025
2958aff
Renamed the optimizer module to reconstructors
yoshikisd Jun 10, 2025
9341355
Separated LBFGS from CDIModel into a Reconstructor subclass
yoshikisd Jun 12, 2025
4232906
Separated SGD from CDIModel into a Reconstructor subclass
yoshikisd Jun 12, 2025
ce67800
Removed CDIModel.AD_optimize
yoshikisd Jun 12, 2025
29198bd
Merge branch 'cdtools-developers:master' into feature/multi-gpu
yoshikisd Jun 13, 2025
a2967d4
Cleaned up model and reconstructor docs
yoshikisd Jun 13, 2025
4ec8399
Removed model and dataset dependencies from distributed
yoshikisd Jun 14, 2025
bc2669e
Fixed CDIModel rank and world_size assignment for single GPU use
yoshikisd Jun 14, 2025
ca28900
Created working implementation of distributing single-GPU scripts to …
yoshikisd Jun 14, 2025
3b17f10
Fixed bug in CDatasets which uses cuda:0 when t.cuda.set_device is ut…
yoshikisd Jun 15, 2025
588b150
Revert "Fixed bug in CDatasets which uses cuda:0 when t.cuda.set_devi…
yoshikisd Jun 15, 2025
50a8b01
fancy_ptycho example can now be run on several GPUs with no modificat…
yoshikisd Jun 15, 2025
f80989d
rank==0 check implemented for several CDIModel file and figure saving…
yoshikisd Jun 16, 2025
3f4fe57
Removed rank==0 check for CDIModel.report
yoshikisd Jun 16, 2025
742d1dd
rank==0 check implemented for CDataset inspect
yoshikisd Jun 17, 2025
efeaf2e
Created custom console script cdt-torchrun to launch single-GPU scrip…
yoshikisd Jun 17, 2025
83aaa80
Removed several multi-gpu example scripts using depracated methods
yoshikisd Jun 17, 2025
f2d618b
Distributive methods support GPU ID selection
yoshikisd Jun 20, 2025
664b0b2
Single-node multi-worker enforced in cdt-torchrun
yoshikisd Jun 20, 2025
400e5ef
refactored cdt-torchrun to use runpy
yoshikisd Jun 27, 2025
526afdb
tidied up wrap_single_gpu_script
yoshikisd Jun 27, 2025
f7e4c2c
Fix to synchronize RNG seed for multi-gpu
yoshikisd Jun 30, 2025
dd72143
Modified speed test to work with cdt-torchrun
yoshikisd Jul 4, 2025
e5cfd25
Corrected script name in distributed_speed_test.py
yoshikisd Jul 4, 2025
e5689c3
Multi-gpu-related model attributes are no longer stored as reconstruc…
yoshikisd Jul 4, 2025
037151d
Depracated spawn-based distributed methods
yoshikisd Jul 5, 2025
7a16bbc
Depracated torchrunner from distributed
yoshikisd Jul 5, 2025
c5ac40b
Merge branch 'cdtools-developers:master' into feature/multi-gpu
yoshikisd Jul 7, 2025
47bbc14
Merge remote-tracking branch 'refs/remotes/origin/feature/multi-gpu' …
yoshikisd Jul 7, 2025
442f869
Merge branch 'cdtools-developers:master' into feature/multi-gpu
yoshikisd Jul 7, 2025
8906eb3
Force pytest to only run stuff in the tests directory
yoshikisd Jul 7, 2025
0699a8b
Reconstructors are imported in CDIModel only when self.reconstructor …
yoshikisd Jul 7, 2025
245e28a
Created test for the Adam reconstructor
yoshikisd Jul 7, 2025
9f1b89e
Removed au particle test from test_fancy_ptycho (already in test_reco…
yoshikisd Jul 7, 2025
9b0731b
Removed depracated methods from the __all__ in distributed.py
yoshikisd Jul 7, 2025
564f499
Altered the seed synchronization step and reorganized bits of the scr…
yoshikisd Jul 11, 2025
2dbb307
Record the time when each CDIModel.loss_history value is stored
yoshikisd Jul 11, 2025
54691fb
Created speed test decorator, changed speed test environ variables, a…
yoshikisd Jul 11, 2025
7c59d66
Rearranged distributed.py
yoshikisd Jul 11, 2025
ce9541a
Updated the documentation in distributed.py
yoshikisd Jul 11, 2025
5a523cf
Got rid of unused imports in CDIModel
yoshikisd Jul 11, 2025
6bd7187
Added CDIModel import to distributed
yoshikisd Jul 11, 2025
13cfee5
Optional plotting and result saving/deleting added to speed test. Als…
yoshikisd Jul 16, 2025
496e8a3
Created pytest to assess multi-gpu reconstruction quality
yoshikisd Jul 16, 2025
08b8619
Merge branch 'master' into feature/multi-gpu
yoshikisd Jul 16, 2025
3cc184e
Fixed bug that lets the slow and multigpu tests run without setting t…
yoshikisd Jul 16, 2025
919c236
Linted and updated documentation on distributed.py
yoshikisd Jul 17, 2025
5ab3ab6
Got rid of some print statements from distributed.py
yoshikisd Jul 17, 2025
5de4e37
Linted Reconstructors
yoshikisd Jul 17, 2025
be6abcd
Linted single_to_multi_gpu.py
yoshikisd Jul 17, 2025
d4ec9ee
Linted test_reconstructors.py
yoshikisd Jul 17, 2025
27f2d54
Linted multi gpu tests and test scripts
yoshikisd Jul 17, 2025
976e94c
Linted and cleaned up the distributed speed test examples
yoshikisd Jul 17, 2025
7e850fe
Added LBFGS and RPI pytest
yoshikisd Jul 17, 2025
5a6c66f
Cleaned up and refactored parts of test_Adam_gold_balls
yoshikisd Jul 17, 2025
01ebaf2
Added pytest for the SGD Reconstructor
yoshikisd Jul 17, 2025
a0052fd
Updated Rank 0 multi gpu flagging for ptycho_2d_dataset and CDIModel
yoshikisd Jul 18, 2025
30eeb43
Added plotting and saving test and got rid of plt show statements
yoshikisd Jul 18, 2025
a2f5956
ReduceLROnPlateau works with multi-GPU
yoshikisd Jul 18, 2025
219bce7
Changed single to double quote in test_plotting_and_saving
yoshikisd Jul 18, 2025
95988d3
Make the print statement in test_plotting_and_saving a single line
yoshikisd Jul 18, 2025
22db614
Merge branch 'master' into feature/multi-gpu
yoshikisd Nov 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions examples/distributed_speed_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
from cdtools.tools.distributed import run_speed_test

# Define the number of GPUs to use for the test. We always need to include
# a single GPU in the test.
#
# Here, we will run trials with 1 and 2 GPUs.
world_sizes = [1, 2]

# We will run 3 trials per GPU to collect statistics on loss-versus-epoch/time
# data as well as runtime speedup.
runs = 3

# We will perform a speed test on a reconstruction script modified to run
# a speed test (see fancy_ptycho_speed_test.py)
script_path = 'fancy_ptycho_speed_test.py'

# When we run the modified script with the speed test, a pickle dump file
# will be generated after each trial. The file contains data about loss-vs-time
# measured for the trial with one or several GPUs used.
output_dir = 'example_loss_data'

# Define the file name prefix. The file will have the following name:
# `<file_prefix>_nGPUs_<world_size>_TRIAL_<run number>.pkl`
file_prefix = 'speed_test'

# We can plot several curves showing what the loss-versus/epoch curves look
# like for each GPU count. The plot will also show the relative runtime
# speed-up relative to the single-GPU runtime.
show_plot = True

# We can also delete the pickle dump files after each trial run has been
# completed and stored by `run_speed_test`
delete_output_file = True

# Run the test. This speed test will return several lists containing the
# means and standard deviations of the final recorded losses and runtime
# speed ups calculated over several trial runs. Each entry index maps to
# the GPU count specified by `world_sizes`.
final_loss_mean, final_loss_std, speed_up_mean, speed_up_std = \
run_speed_test(world_sizes=world_sizes,
runs=runs,
script_path=script_path,
output_dir=output_dir,
file_prefix=file_prefix,
show_plot=show_plot,
delete_output_files=delete_output_file)
53 changes: 53 additions & 0 deletions examples/fancy_ptycho_speed_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
import cdtools


# To modify fancy_ptycho.py for a multi-GPU speed test, we need to enclose the
# entire reconstruction script in a function. The function then needs to be
# decorated with cdtools.tools.distributed.report_speed_test. The decorator
# allows data to be saved and read by the multi-GPU speed test function
# which we will use to run this script.
@cdtools.tools.distributed.report_speed_test
def main():
filename = 'example_data/lab_ptycho_data.cxi'
dataset = cdtools.datasets.Ptycho2DDataset.from_cxi(filename)

model = cdtools.models.FancyPtycho.from_dataset(
dataset,
n_modes=3,
oversampling=2,
probe_support_radius=120,
propagation_distance=5e-3,
units='mm',
obj_view_crop=-50
)

device = 'cuda'
model.to(device=device)
dataset.get_as(device=device)

# Remove or comment out plotting existing plotting statements
for loss in model.Adam_optimize(50, dataset, lr=0.02, batch_size=40):
# Optional: ensure that only a single GPU prints a report by
# adding an if statement. Without this, the print statement will
# be called by all participating GPUs, resulting in multiple copies
# of the printed model report.
if model.rank == 0:
print(model.report())

for loss in model.Adam_optimize(25, dataset, lr=0.005, batch_size=40):
if model.rank == 0:
print(model.report())

for loss in model.Adam_optimize(25, dataset, lr=0.001, batch_size=40):
if model.rank == 0:
print(model.report())

model.tidy_probes()

# We need to return the model so the data can be saved by the decorator.
return model


# We also need to include this if-name-main block at the end
if __name__ == '__main__':
main()
5 changes: 4 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
[tool.ruff]
# Decrease the maximum line length to 79 characters.
line-length = 79
line-length = 79

[tool.pytest.ini_options]
testpaths = 'tests'
5 changes: 5 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,5 +47,10 @@
"Programming Language :: Python :: 3",
"Operating System :: OS Independent",
],
entry_points={
'console_scripts': {
'cdt-torchrun = cdtools.tools.distributed.distributed:run_single_to_multi_gpu'
}
}
)

13 changes: 13 additions & 0 deletions src/cdtools/datasets/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
import pathlib
from cdtools.tools import data as cdtdata
from torch.utils import data as torchdata
import os

__all__ = ['CDataset']

Expand Down Expand Up @@ -92,6 +93,18 @@ def __init__(

self.get_as(device='cpu')

# These attributes indicate to the CDataset methods whether or not
# multi-GPU calculations are being performed. These flags are mostly
# used to prevent the production of duplicate plots when CDataset.inspect
# is called.
rank = os.environ.get('RANK')
world_size = os.environ.get('WORLD_SIZE')
# Rank of the subprocess running the GPU (defauly rank 0)
self.rank = int(rank) if rank is not None else 0
# Total number of GPUs being used.
self.world_size = int(world_size) if world_size is not None else 1
self.multi_gpu_used = int(self.world_size) > 1


def to(self, *args, **kwargs):
"""Sends the relevant data to the given device and dtype
Expand Down
6 changes: 5 additions & 1 deletion src/cdtools/datasets/ptycho_2d_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,8 @@ def to_cxi(self, cxi_file):
cxi_file : str, pathlib.Path, or h5py.File
The .cxi file to write to
"""
if self.multi_gpu_used and self.rank != 0:
return

# If a bare string is passed
if isinstance(cxi_file, str) or isinstance(cxi_file, pathlib.Path):
Expand Down Expand Up @@ -230,7 +232,9 @@ def inspect(
can display a base-10 log plot of the detector readout at each
position.
"""

# FOR MULTI-GPU: Only run this method if it's called by the rank 0 GPU
if self.multi_gpu_used and self.rank != 0:
return

def get_images(idx):
inputs, output = self[idx]
Expand Down
73 changes: 56 additions & 17 deletions src/cdtools/models/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,11 @@
"""

import torch as t
from torch.utils import data as torchdata
from matplotlib import pyplot as plt
from matplotlib.widgets import Slider
from matplotlib import ticker
import numpy as np
import threading
import queue
import time
from scipy import io
from contextlib import contextmanager
from cdtools.tools.data import nested_dict_to_h5, h5_to_nested_dict, nested_dict_to_numpy, nested_dict_to_torch
from cdtools.reconstructors import AdamReconstructor, LBFGSReconstructor, SGDReconstructor
Expand Down Expand Up @@ -65,6 +61,25 @@ def __init__(self):
self.training_history = ''
self.epoch = 0

# These attributes indicate to the CDIModel methods whether or not
# multi-GPU calculations are being performed. These flags help
# trigger multi-GPU-specific function calls (i.e., all_reduce) and
# prevent redundant plots/reports/saves during multi-GPU use.
rank = os.environ.get('RANK')
world_size = os.environ.get('WORLD_SIZE')

# Rank of the subprocess running the GPU (defauly rank 0)
self.rank = int(rank) if rank is not None else 0
# Total number of GPUs being used.
self.world_size = int(world_size) if world_size is not None else 1
self.multi_gpu_used = int(self.world_size) > 1

# Keep track of the time each loss history point was taken relative to
# the initialization of this model.
self.INITIAL_TIME = time.time()
self.loss_times = []


def from_dataset(self, dataset):
raise NotImplementedError()

Expand Down Expand Up @@ -197,7 +212,9 @@ def save_to_h5(self, filename, *args):
*args
Accepts any additional args that model.save_results needs, for this model
"""
return nested_dict_to_h5(filename, self.save_results(*args))
# FOR MULTI-GPU: Only run this method if it's called by the rank 0 GPU
if not (self.multi_gpu_used and self.rank != 0):
return nested_dict_to_h5(filename, self.save_results(*args))


@contextmanager
Expand All @@ -219,12 +236,17 @@ def save_on_exit(self, filename, *args, exception_filename=None):
"""
try:
yield
self.save_to_h5(filename, *args)
except:
if exception_filename is None:
exception_filename = filename
self.save_to_h5(exception_filename, *args)
raise

# Only let the Rank 0 GPU handle saving in multi-GPU
if not (self.multi_gpu_used and self.rank != 0):
self.save_to_h5(filename, *args)

except Exception as e:
if not (self.multi_gpu_used and self.rank != 0):
if exception_filename is None:
exception_filename = filename
self.save_to_h5(exception_filename, *args)
raise e

@contextmanager
def save_on_exception(self, filename, *args):
Expand All @@ -242,13 +264,15 @@ def save_on_exception(self, filename, *args):
*args
Accepts any additional args that model.save_results needs, for this model
"""
# FOR MULTI-GPU: Only run this method if it's called by the rank 0 GPU
try:
yield
except:
self.save_to_h5(filename, *args)
print('Intermediate results saved under name:')
print(filename, flush=True)
raise
except Exception as e:
if not (self.multi_gpu_used and self.rank != 0):
self.save_to_h5(filename, *args)
print('Intermediate results saved under name:')
print(filename, flush=True)
raise e


def use_checkpoints(self, job_id, checkpoint_file_stem):
Expand All @@ -270,6 +294,10 @@ def skip_computation(self):
return False

def save_checkpoint(self, *args, checkpoint_file=None):
# FOR MULTI-GPU: Only run this method if it's called by the rank 0 GPU
if self.multi_gpu_used and self.rank != 0:
return

checkpoint = self.save_results(*args)
if (hasattr(self, 'current_optimizer')
and self.current_optimizer is not None):
Expand Down Expand Up @@ -578,6 +606,10 @@ def inspect(self, dataset=None, update=True):
Whether to update existing plots or plot new ones

"""
# FOR MULTI-GPU: Only run this method if it's called by the rank 0 GPU
if self.multi_gpu_used and self.rank != 0:
return

# We find or create all the figures
first_update = False
if update and hasattr(self, 'figs') and self.figs:
Expand Down Expand Up @@ -660,7 +692,10 @@ def save_figures(self, prefix='', extension='.pdf'):
extention : strategy
Default is .eps, the file extension to save with.
"""

# FOR MULTI-GPU: Only run this method if it's called by the rank 0 GPU
if self.multi_gpu_used and self.rank != 0:
return

if hasattr(self, 'figs') and self.figs:
figs = self.figs
else:
Expand Down Expand Up @@ -688,6 +723,10 @@ def compare(self, dataset, logarithmic=False):
Whether to plot the diffraction on a logarithmic scale
"""

# FOR MULTI-GPU: Only run this method if it's called by the rank 0 GPU
if self.multi_gpu_used and self.rank != 0:
return

fig, axes = plt.subplots(1,3,figsize=(12,5.3))
fig.tight_layout(rect=[0.02, 0.09, 0.98, 0.96])
axslider = plt.axes([0.15,0.06,0.75,0.03])
Expand Down
Loading
Loading