Skip to content

[TeleViz] Add televiz basic shaders + DeviceImage CUDA-Vulkan interop#451

Merged
farbod-nv merged 10 commits into
mainfrom
fm/televiz_m3a
May 4, 2026
Merged

[TeleViz] Add televiz basic shaders + DeviceImage CUDA-Vulkan interop#451
farbod-nv merged 10 commits into
mainfrom
fm/televiz_m3a

Conversation

@farbod-nv
Copy link
Copy Markdown
Contributor

@farbod-nv farbod-nv commented May 1, 2026

  • viz/shaders/: glslang-compiled SPIR-V embedded as constexpr arrays; ships textured_quad.{vert,frag} for the upcoming QuadLayer.
  • viz/core/device_image: VkImage backed by external memory, imported into CUDA as cudaArray_t. Symmetric pair to HostImage.
  • viz/core/vk_context: pin CUDA device to Vulkan physical device by UUID at init() so interop types can assume same-GPU operation.
  • CI: add CUDA Toolkit + glslang-tools to build-ubuntu and sanitizer.
  • Tests: viz_shaders_tests, viz_core_tests DeviceImage round-trip; all unit tests pass under ASAN+UBSAN.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added CUDA support for GPU image operations and memory interoperability with Vulkan
    • Introduced GLSL shader compilation system with automated SPIR-V generation
  • Tests

    • Added GPU-based image handling tests
    • Added shader blob validation tests
  • Chores

    • Updated CI to install CUDA Toolkit and shader compilation tools
    • Added build-time dependencies for shader compilation

- viz/shaders/: glslang-compiled SPIR-V embedded as constexpr arrays;
  ships textured_quad.{vert,frag} for the upcoming QuadLayer.
- viz/core/device_image: VkImage backed by external memory, imported
  into CUDA as cudaArray_t. Symmetric pair to HostImage.
- viz/core/vk_context: pin CUDA device to Vulkan physical device by
  UUID at init() so interop types can assume same-GPU operation.
- CI: add CUDA Toolkit + glslang-tools to build-ubuntu and sanitizer.
- Tests: viz_shaders_tests, viz_core_tests DeviceImage round-trip;
  all unit tests pass under ASAN+UBSAN.

Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 1, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 768f483b-e1fe-42c9-91df-deb7b2115fcc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request introduces CUDA-Vulkan GPU memory interop capabilities to the visualization module. The changes add CUDA Toolkit as a build requirement, implement a new DeviceImage class that creates Vulkan images with CUDA-accessible backing memory via external memory handles, introduce shader compilation infrastructure using glslangValidator to embed SPIR-V bytecode, and extend VkContext to match the active CUDA device UUID with the selected Vulkan physical device. Supporting changes include CI workflow updates, comprehensive documentation, and new test suites validating both the interop implementation and shader compilation.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant VkCtx as VkContext::init()
    participant VkDev as Vulkan Physical Device
    participant CUDA as CUDA Driver
    participant DevImg as DeviceImage::create()
    participant VkImg as Vulkan Image
    participant VkMem as Vulkan Device Memory
    participant CudaArr as CUDA Array

    App->>VkCtx: initialize graphics context
    VkCtx->>VkDev: vkGetPhysicalDeviceProperties2 (UUID)
    VkDev-->>VkCtx: physical device UUID
    VkCtx->>CUDA: enumerate CUDA devices
    CUDA-->>VkCtx: device UUIDs
    VkCtx->>CUDA: match UUID & cudaSetDevice()
    CUDA-->>VkCtx: active device set

    App->>DevImg: create(vk_context, resolution, format)
    DevImg->>VkImg: vkCreateImage (exportable memory)
    VkImg-->>DevImg: VkImage handle
    DevImg->>VkMem: vkAllocateMemory (exportable device-local)
    VkMem-->>DevImg: VkDeviceMemory handle
    DevImg->>VkMem: vkGetMemoryFdProperties (external FD)
    VkMem-->>DevImg: memory file descriptor
    DevImg->>CUDA: cudaImportExternalMemory (FD)
    CUDA-->>DevImg: external memory handle
    DevImg->>CudaArr: cudaExternalMemoryGetMappedMipmappedArray
    CudaArr-->>DevImg: CUDA mipmapped array
    DevImg->>VkImg: vkCreateImageView
    VkImg-->>DevImg: VkImageView handle
    DevImg-->>App: DeviceImage ready (CUDA & Vulkan handles)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.15% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately summarizes the two primary additions: basic shaders (via GLSLang compilation and embedding) and DeviceImage CUDA-Vulkan interop. It directly reflects the main changes across all file modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fm/televiz_m3a

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/viz/core/cpp/device_image.cpp`:
- Around line 336-376: The command buffer allocated via vkAllocateCommandBuffers
in the transition path can leak if any subsequent check_vk or Vulkan call
throws; wrap the sequence between allocation and vkFreeCommandBuffers in a
try/catch (or use RAII) so that vkFreeCommandBuffers(device, command_pool_, 1,
&cmd) is always called: after allocating cmd, run the vkBeginCommandBuffer,
vkCmdPipelineBarrier, vkEndCommandBuffer, vkQueueSubmit and vkQueueWaitIdle
inside a try block and in the catch (or a finally-equivalent) call
vkFreeCommandBuffers and rethrow the exception; reference the functions/checks
vkAllocateCommandBuffers, check_vk, vkBeginCommandBuffer, vkEndCommandBuffer,
vkQueueSubmit, vkQueueWaitIdle and vkFreeCommandBuffers to locate and protect
the allocated resource.
- Around line 117-172: DeviceImage::destroy currently tears down CUDA and Vulkan
resources without waiting for GPU work, risking use-after-free; fix by inserting
explicit synchronization before any resource frees: if any CUDA resources
(cuda_mipmapped_array_ or cuda_external_memory_) may be active call
cudaDeviceSynchronize() (best-effort, ignore non-fatal return) before
cudaFreeMipmappedArray/cudaDestroyExternalMemory, and after verifying ctx_ and
obtaining device call vkDeviceWaitIdle(device) (or the appropriate queue-fence
wait via ctx_) before destroying command_pool_, image_view_, image_, and freeing
memory_; keep the existing early-null checks but perform the CUDA sync prior to
CUDA frees and the Vulkan device wait prior to Vulkan teardown (use symbols
cuda_mipmapped_array_, cuda_external_memory_, cudaDeviceSynchronize,
ctx_->device(), and vkDeviceWaitIdle).

In `@src/viz/core/cpp/vk_context.cpp`:
- Line 194: The call to cudaSetDevice() inside match_cuda_device_to_vulkan()
during init() only sets the device for the init thread; subsequent CUDA calls
from other threads (e.g., DeviceImage::import_to_cuda(),
cudaImportExternalMemory()) will run on the wrong device. Fix by either
documenting that VkContext and all viz_core types are single-threaded and must
be used only from the init thread, or (preferred) add a defensive thread-local
device guard: store the matched CUDA device id in VkContext during
match_cuda_device_to_vulkan(), then ensure every CUDA entry point in viz_core
(for example DeviceImage::import_to_cuda(), any wrappers that call
cudaImportExternalMemory(), and other CUDA-facing methods) calls
cudaSetDevice(vk_context->matched_cuda_device_id) at start (or uses a
thread-local RAII guard that calls cudaSetDevice on construction) to guarantee
correct device affinity across threads.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c1e02f0b-aa6e-4202-bf75-44e0af115541

📥 Commits

Reviewing files that changed from the base of the PR and between a939dd1 and c991a64.

📒 Files selected for processing (19)
  • .github/workflows/build-ubuntu.yml
  • deps/README.md
  • src/viz/AGENTS.md
  • src/viz/CMakeLists.txt
  • src/viz/core/cpp/CMakeLists.txt
  • src/viz/core/cpp/device_image.cpp
  • src/viz/core/cpp/inc/viz/core/device_image.hpp
  • src/viz/core/cpp/inc/viz/core/vk_context.hpp
  • src/viz/core/cpp/vk_context.cpp
  • src/viz/core_tests/cpp/CMakeLists.txt
  • src/viz/core_tests/cpp/test_device_image.cpp
  • src/viz/shaders/CMakeLists.txt
  • src/viz/shaders/cpp/CMakeLists.txt
  • src/viz/shaders/cpp/compile_shader.cmake
  • src/viz/shaders/cpp/textured_quad.frag
  • src/viz/shaders/cpp/textured_quad.vert
  • src/viz/shaders_tests/CMakeLists.txt
  • src/viz/shaders_tests/cpp/CMakeLists.txt
  • src/viz/shaders_tests/cpp/test_shader_blobs.cpp

Comment thread src/viz/core/cpp/device_image.cpp
Comment thread src/viz/core/cpp/device_image.cpp
Comment thread src/viz/core/cpp/vk_context.cpp
farbod-nv and others added 2 commits May 1, 2026 11:32
…a device

- run_one_shot_layout_transition: wrap submit/wait in an RAII guard so
  the command buffer is freed on every exit path (otherwise a queue
  submit failure leaks one cmd per retry).
- DeviceImage::destroy: cudaDeviceSynchronize before CUDA frees and
  vkDeviceWaitIdle before Vulkan teardown, so async work submitted by
  the caller has retired before the resources go away.
- VkContext stores the matched CUDA device id and exposes it via
  cuda_device_id(); DeviceImage::import_to_cuda + ::destroy now call
  cudaSetDevice on the current thread before any CUDA API. cudaSetDevice
  is per-host-thread, so this protects users who create a DeviceImage
  on a worker thread.

All 37 unit + 28 GPU tests pass; unit tests also pass under ASAN+UBSAN.

Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Jimver/cuda-toolkit@v0.2.19 hardcodes /x86_64/ in the NVIDIA repo URL,
which fails on the ubuntu-22.04-arm matrix entry (cuda-nvcc-12-4 /
cuda-cudart-12-4 packages don't exist for that arch on that path).

Replace with a small composite action (.github/actions/setup-cuda)
that picks /x86_64/ or /sbsa/ based on dpkg --print-architecture and
installs cuda-nvcc-* + cuda-cudart-dev-* via apt. Used in both
build-ubuntu and test-viz-sanitizers.

Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@farbod-nv farbod-nv changed the title Add televiz basic shaders + DeviceImage CUDA-Vulkan interop [TeleViz] Add televiz basic shaders + DeviceImage CUDA-Vulkan interop May 1, 2026
farbod-nv and others added 7 commits May 1, 2026 13:31
viz_core links CUDAToolkit::cudart, but the experimental Windows CI
runner doesn't have CUDA installed. With BUILD_VIZ=ON we hit
"Could not find nvcc" at find_package(CUDAToolkit) time. No Windows-XR
consumer for viz today, so flip it OFF for now and add CUDA install
to build-windows.yml when we have a real reason to ship viz on Windows.

Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Jimver/cuda-toolkit works correctly on Windows — the /x86_64/
hardcoding bug only bites on Linux ARM. Runs NVIDIA's silent
network installer and sets CUDA_PATH so find_package(CUDAToolkit)
succeeds. Re-enables BUILD_VIZ=ON for the experimental Windows job.

Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
GPU runners have the NVIDIA driver (libcuda.so.1) but not the CUDA
Toolkit (libcudart.so.12). After M3a's CUDA dependency landed, viz
test binaries fail to load on the GPU runners with "cannot open shared
object file: libcudart.so.12".

Bundle libcudart.so.12 from the build host's CUDA install into the
viz-tests-* artifact, then point LD_LIBRARY_PATH at the artifact dir
when running tests. Same pattern as auditwheel for the Python wheel.

Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Reverses the bundle-libcudart-in-artifact hack from 2203957. Run the
same setup-cuda composite action on the test-viz-gpu job: NVIDIA's
apt postinst registers /etc/ld.so.conf.d/cuda-12-4.conf so libcudart
lands on the standard ld.so search path with no LD_LIBRARY_PATH or
artifact gymnastics needed. Symmetric with build-ubuntu, and we'd
need cudart on the runner anyway for richer GPU tests in M3b+.

Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
device_image.cpp included <unistd.h> for ::close on the fd returned
by vkGetMemoryFdKHR. MSVC has no <unistd.h>; build-windows fails with
"Cannot open include file: 'unistd.h'".

Wrap close in a tiny shim: <io.h>+_close on _WIN32, <unistd.h>+close
elsewhere. The whole fd path is unreachable at runtime on Windows
(vkGetMemoryFdKHR returns nullptr on that platform → import_to_cuda
throws before memory_fd_ is set), but we still need a clean compile.

Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Switch CUDA::cudart -> CUDA::cudart_static. Makes the wheel and
viz_*_tests artifacts self-contained:
- Wheel: _viz.so has no dynamic libcudart.so.12 dep so auditwheel
  bundles nothing CUDA-related (release artifact stays clean).
- Test artifacts: run on GPU runners that have only the NVIDIA driver
  (libcuda.so.1). The self-hosted runner's sudo policy disallows apt
  installs from a job step, so we can't install the toolkit there.
- Drops the back-and-forth between bundling libcudart in the artifact
  vs. installing CUDA on the GPU runner — neither is needed now.

Build host still needs the CUDA Toolkit for libcudart_static.a;
setup-cuda already covers that on build-ubuntu / test-viz-sanitizers.

Tradeoff: ~3 MB binary growth per consumer. Safe today because
viz_core is the only CUDA-using component in the codebase.

Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Now that viz_core links libcudart_static (4eb4db1), the GPU runner
no longer needs the CUDA Toolkit. The previous setup-cuda step was
failing on the self-hosted runner anyway (sudo policy), and is now
unnecessary. Test binaries depend only on the NVIDIA driver
(libcuda.so.1) which is already present on the runner.

Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@farbod-nv farbod-nv merged commit 4da8221 into main May 4, 2026
86 of 110 checks passed
@farbod-nv farbod-nv deleted the fm/televiz_m3a branch May 4, 2026 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants