Summary
Add support for distributing benchmark workloads across multiple GPUs, splitting batches and aggregating results to demonstrate multi-GPU scaling.
Motivation
Many workstations and servers have multiple GPUs installed. The current implementation targets a single accelerator at a time. Multi-GPU fan-out would demonstrate near-linear scaling for embarrassingly parallel workloads and provide a realistic picture of what production GPU-accelerated systems look like. This is especially relevant for domains like financial simulation and AI inference where multi-GPU setups are common.
Acceptance Criteria
Technical Notes
- ILGPU's
Context can enumerate multiple devices — use this for discovery
- Each GPU will need its own
Accelerator instance and memory buffers
- Synchronisation between GPUs happens on the host side (no direct GPU-to-GPU needed for this use case)
- Consider using
Task.WhenAll or similar for parallel dispatch across GPUs
- Batch splitting should account for uneven division (last GPU gets remainder)
Summary
Add support for distributing benchmark workloads across multiple GPUs, splitting batches and aggregating results to demonstrate multi-GPU scaling.
Motivation
Many workstations and servers have multiple GPUs installed. The current implementation targets a single accelerator at a time. Multi-GPU fan-out would demonstrate near-linear scaling for embarrassingly parallel workloads and provide a realistic picture of what production GPU-accelerated systems look like. This is especially relevant for domains like financial simulation and AI inference where multi-GPU setups are common.
Acceptance Criteria
--multi-gpuCLI flag that distributes work across all available GPUs of the same typeTechnical Notes
Contextcan enumerate multiple devices — use this for discoveryAcceleratorinstance and memory buffersTask.WhenAllor similar for parallel dispatch across GPUs