Synchronize staging copy before handing buffer to MPI (CUDA) by mjwilkins18 · Pull Request #6 · cornelisnetworks/cail

mjwilkins18 · 2026-06-16T02:32:36Z

cail_gpu_memcpy stages the send buffer with a device-to-device cudaMemcpy
and then hands the staged buffer to GPU-aware MPI, which reads it from a
separate NIC / GDRCopy engine. A device-to-device cudaMemcpy has no host-side
completion guarantee on return, so the external read can race the copy.

Add a cudaStreamSynchronize(0) release fence after the staging copy so the
staged data is visible before the buffer is exposed to PMPI. The fence is scoped
to the default stream (where the synchronous cudaMemcpy runs) rather than the
whole device, so unrelated device work is not serialized. Mirrors the equivalent
fence on the ROCm path.

Draft: not yet validated on CUDA hardware.

cail_gpu_memcpy stages the send buffer with a device-to-device cudaMemcpy and then hands the staged buffer straight to GPU-aware MPI, which reads it from a separate NIC / GDRCopy engine. A device-to-device cudaMemcpy has no host-side completion guarantee on return, so the subsequent external read can race the copy. Add a cudaStreamSynchronize(0) release fence after the staging copy so the staged data is guaranteed visible before the buffer is exposed to PMPI. The fence is scoped to the default stream (where the synchronous cudaMemcpy runs) rather than the whole device, so unrelated device work is not serialized. This mirrors the equivalent fence on the ROCm path.

mjwilkins18 marked this pull request as ready for review June 18, 2026 13:40

mjwilkins18 force-pushed the mjwilkins18/cuda-staging-fence branch from 364ffba to 4bb6962 Compare June 18, 2026 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronize staging copy before handing buffer to MPI (CUDA)#6

Synchronize staging copy before handing buffer to MPI (CUDA)#6
mjwilkins18 wants to merge 1 commit into
mainfrom
mjwilkins18/cuda-staging-fence

mjwilkins18 commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

mjwilkins18 commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant