[TeleViz] Add televiz basic shaders + DeviceImage CUDA-Vulkan interop#451
Conversation
- viz/shaders/: glslang-compiled SPIR-V embedded as constexpr arrays;
ships textured_quad.{vert,frag} for the upcoming QuadLayer.
- viz/core/device_image: VkImage backed by external memory, imported
into CUDA as cudaArray_t. Symmetric pair to HostImage.
- viz/core/vk_context: pin CUDA device to Vulkan physical device by
UUID at init() so interop types can assume same-GPU operation.
- CI: add CUDA Toolkit + glslang-tools to build-ubuntu and sanitizer.
- Tests: viz_shaders_tests, viz_core_tests DeviceImage round-trip;
all unit tests pass under ASAN+UBSAN.
Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis pull request introduces CUDA-Vulkan GPU memory interop capabilities to the visualization module. The changes add CUDA Toolkit as a build requirement, implement a new Sequence Diagram(s)sequenceDiagram
participant App as Application
participant VkCtx as VkContext::init()
participant VkDev as Vulkan Physical Device
participant CUDA as CUDA Driver
participant DevImg as DeviceImage::create()
participant VkImg as Vulkan Image
participant VkMem as Vulkan Device Memory
participant CudaArr as CUDA Array
App->>VkCtx: initialize graphics context
VkCtx->>VkDev: vkGetPhysicalDeviceProperties2 (UUID)
VkDev-->>VkCtx: physical device UUID
VkCtx->>CUDA: enumerate CUDA devices
CUDA-->>VkCtx: device UUIDs
VkCtx->>CUDA: match UUID & cudaSetDevice()
CUDA-->>VkCtx: active device set
App->>DevImg: create(vk_context, resolution, format)
DevImg->>VkImg: vkCreateImage (exportable memory)
VkImg-->>DevImg: VkImage handle
DevImg->>VkMem: vkAllocateMemory (exportable device-local)
VkMem-->>DevImg: VkDeviceMemory handle
DevImg->>VkMem: vkGetMemoryFdProperties (external FD)
VkMem-->>DevImg: memory file descriptor
DevImg->>CUDA: cudaImportExternalMemory (FD)
CUDA-->>DevImg: external memory handle
DevImg->>CudaArr: cudaExternalMemoryGetMappedMipmappedArray
CudaArr-->>DevImg: CUDA mipmapped array
DevImg->>VkImg: vkCreateImageView
VkImg-->>DevImg: VkImageView handle
DevImg-->>App: DeviceImage ready (CUDA & Vulkan handles)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/viz/core/cpp/device_image.cpp`:
- Around line 336-376: The command buffer allocated via vkAllocateCommandBuffers
in the transition path can leak if any subsequent check_vk or Vulkan call
throws; wrap the sequence between allocation and vkFreeCommandBuffers in a
try/catch (or use RAII) so that vkFreeCommandBuffers(device, command_pool_, 1,
&cmd) is always called: after allocating cmd, run the vkBeginCommandBuffer,
vkCmdPipelineBarrier, vkEndCommandBuffer, vkQueueSubmit and vkQueueWaitIdle
inside a try block and in the catch (or a finally-equivalent) call
vkFreeCommandBuffers and rethrow the exception; reference the functions/checks
vkAllocateCommandBuffers, check_vk, vkBeginCommandBuffer, vkEndCommandBuffer,
vkQueueSubmit, vkQueueWaitIdle and vkFreeCommandBuffers to locate and protect
the allocated resource.
- Around line 117-172: DeviceImage::destroy currently tears down CUDA and Vulkan
resources without waiting for GPU work, risking use-after-free; fix by inserting
explicit synchronization before any resource frees: if any CUDA resources
(cuda_mipmapped_array_ or cuda_external_memory_) may be active call
cudaDeviceSynchronize() (best-effort, ignore non-fatal return) before
cudaFreeMipmappedArray/cudaDestroyExternalMemory, and after verifying ctx_ and
obtaining device call vkDeviceWaitIdle(device) (or the appropriate queue-fence
wait via ctx_) before destroying command_pool_, image_view_, image_, and freeing
memory_; keep the existing early-null checks but perform the CUDA sync prior to
CUDA frees and the Vulkan device wait prior to Vulkan teardown (use symbols
cuda_mipmapped_array_, cuda_external_memory_, cudaDeviceSynchronize,
ctx_->device(), and vkDeviceWaitIdle).
In `@src/viz/core/cpp/vk_context.cpp`:
- Line 194: The call to cudaSetDevice() inside match_cuda_device_to_vulkan()
during init() only sets the device for the init thread; subsequent CUDA calls
from other threads (e.g., DeviceImage::import_to_cuda(),
cudaImportExternalMemory()) will run on the wrong device. Fix by either
documenting that VkContext and all viz_core types are single-threaded and must
be used only from the init thread, or (preferred) add a defensive thread-local
device guard: store the matched CUDA device id in VkContext during
match_cuda_device_to_vulkan(), then ensure every CUDA entry point in viz_core
(for example DeviceImage::import_to_cuda(), any wrappers that call
cudaImportExternalMemory(), and other CUDA-facing methods) calls
cudaSetDevice(vk_context->matched_cuda_device_id) at start (or uses a
thread-local RAII guard that calls cudaSetDevice on construction) to guarantee
correct device affinity across threads.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c1e02f0b-aa6e-4202-bf75-44e0af115541
📒 Files selected for processing (19)
.github/workflows/build-ubuntu.ymldeps/README.mdsrc/viz/AGENTS.mdsrc/viz/CMakeLists.txtsrc/viz/core/cpp/CMakeLists.txtsrc/viz/core/cpp/device_image.cppsrc/viz/core/cpp/inc/viz/core/device_image.hppsrc/viz/core/cpp/inc/viz/core/vk_context.hppsrc/viz/core/cpp/vk_context.cppsrc/viz/core_tests/cpp/CMakeLists.txtsrc/viz/core_tests/cpp/test_device_image.cppsrc/viz/shaders/CMakeLists.txtsrc/viz/shaders/cpp/CMakeLists.txtsrc/viz/shaders/cpp/compile_shader.cmakesrc/viz/shaders/cpp/textured_quad.fragsrc/viz/shaders/cpp/textured_quad.vertsrc/viz/shaders_tests/CMakeLists.txtsrc/viz/shaders_tests/cpp/CMakeLists.txtsrc/viz/shaders_tests/cpp/test_shader_blobs.cpp
…a device - run_one_shot_layout_transition: wrap submit/wait in an RAII guard so the command buffer is freed on every exit path (otherwise a queue submit failure leaks one cmd per retry). - DeviceImage::destroy: cudaDeviceSynchronize before CUDA frees and vkDeviceWaitIdle before Vulkan teardown, so async work submitted by the caller has retired before the resources go away. - VkContext stores the matched CUDA device id and exposes it via cuda_device_id(); DeviceImage::import_to_cuda + ::destroy now call cudaSetDevice on the current thread before any CUDA API. cudaSetDevice is per-host-thread, so this protects users who create a DeviceImage on a worker thread. All 37 unit + 28 GPU tests pass; unit tests also pass under ASAN+UBSAN. Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Jimver/cuda-toolkit@v0.2.19 hardcodes /x86_64/ in the NVIDIA repo URL, which fails on the ubuntu-22.04-arm matrix entry (cuda-nvcc-12-4 / cuda-cudart-12-4 packages don't exist for that arch on that path). Replace with a small composite action (.github/actions/setup-cuda) that picks /x86_64/ or /sbsa/ based on dpkg --print-architecture and installs cuda-nvcc-* + cuda-cudart-dev-* via apt. Used in both build-ubuntu and test-viz-sanitizers. Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
viz_core links CUDAToolkit::cudart, but the experimental Windows CI runner doesn't have CUDA installed. With BUILD_VIZ=ON we hit "Could not find nvcc" at find_package(CUDAToolkit) time. No Windows-XR consumer for viz today, so flip it OFF for now and add CUDA install to build-windows.yml when we have a real reason to ship viz on Windows. Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Jimver/cuda-toolkit works correctly on Windows — the /x86_64/ hardcoding bug only bites on Linux ARM. Runs NVIDIA's silent network installer and sets CUDA_PATH so find_package(CUDAToolkit) succeeds. Re-enables BUILD_VIZ=ON for the experimental Windows job. Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
GPU runners have the NVIDIA driver (libcuda.so.1) but not the CUDA Toolkit (libcudart.so.12). After M3a's CUDA dependency landed, viz test binaries fail to load on the GPU runners with "cannot open shared object file: libcudart.so.12". Bundle libcudart.so.12 from the build host's CUDA install into the viz-tests-* artifact, then point LD_LIBRARY_PATH at the artifact dir when running tests. Same pattern as auditwheel for the Python wheel. Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Reverses the bundle-libcudart-in-artifact hack from 2203957. Run the same setup-cuda composite action on the test-viz-gpu job: NVIDIA's apt postinst registers /etc/ld.so.conf.d/cuda-12-4.conf so libcudart lands on the standard ld.so search path with no LD_LIBRARY_PATH or artifact gymnastics needed. Symmetric with build-ubuntu, and we'd need cudart on the runner anyway for richer GPU tests in M3b+. Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
device_image.cpp included <unistd.h> for ::close on the fd returned by vkGetMemoryFdKHR. MSVC has no <unistd.h>; build-windows fails with "Cannot open include file: 'unistd.h'". Wrap close in a tiny shim: <io.h>+_close on _WIN32, <unistd.h>+close elsewhere. The whole fd path is unreachable at runtime on Windows (vkGetMemoryFdKHR returns nullptr on that platform → import_to_cuda throws before memory_fd_ is set), but we still need a clean compile. Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Switch CUDA::cudart -> CUDA::cudart_static. Makes the wheel and viz_*_tests artifacts self-contained: - Wheel: _viz.so has no dynamic libcudart.so.12 dep so auditwheel bundles nothing CUDA-related (release artifact stays clean). - Test artifacts: run on GPU runners that have only the NVIDIA driver (libcuda.so.1). The self-hosted runner's sudo policy disallows apt installs from a job step, so we can't install the toolkit there. - Drops the back-and-forth between bundling libcudart in the artifact vs. installing CUDA on the GPU runner — neither is needed now. Build host still needs the CUDA Toolkit for libcudart_static.a; setup-cuda already covers that on build-ubuntu / test-viz-sanitizers. Tradeoff: ~3 MB binary growth per consumer. Safe today because viz_core is the only CUDA-using component in the codebase. Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Now that viz_core links libcudart_static (4eb4db1), the GPU runner no longer needs the CUDA Toolkit. The previous setup-cuda step was failing on the self-hosted runner anyway (sudo policy), and is now unnecessary. Test binaries depend only on the NVIDIA driver (libcuda.so.1) which is already present on the runner. Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Summary by CodeRabbit
Release Notes
New Features
Tests
Chores