Skip to content

[Bug/Support] NVIDIA GB10 (Grace Blackwell) Reporting Issues (PCIe, Fan, Mem) #426

@chromedot

Description

@chromedot

Generated with my good buddy Antigravity + Gemini 3 Pro High

Description
I am running nvtop on the new NVIDIA GB10 (Grace Blackwell) platform (Dell Pro Max Workstation Gb10 = Nvidia DGX Spark). While it correctly detects the GPU name and utilization, several metrics are reported incorrectly or as N/A, likely due to the SoC nature of the architecture (NVLink-C2C instead of traditional PCIe).

System Information
Hardware: NVIDIA GB10 (Grace Blackwell Superchip)
Architecture: ARM64 (aarch64)
OS: Ubuntu 24.04.3 LTS
NVIDIA Driver: 580.95.05
CUDA Version: 13.0
Kernel: Linux 6.8.0-49-generic (aarch64)

Observed Issues
PCIe Misreporting:
Reported as: PCIe GEN 1@ 1x
Context: This is an SoC (System on Chip) where the GPU is connected via NVLink-C2C to the Grace CPU, not a standard PCIe slot. It seems nvtop is falling back to a default/placeholder value.
Memory Usage (Header):
Reported as: MEM[ N/A] in the bar gauge.
Context: nvidia-smi also reports "Not Supported" for memory usage on this platform (Unified Memory architecture). However, the process list at the bottom of nvtop does correctly show memory usage (e.g., 63931MiB for my python process). It would be great if the header gauge could aggregate the process memory or use the same source as the process list.
Fan Speed:
Reported as: FAN N/A%
Context: Likely managed by the chassis controller or SoC thermal subsystem, not exposed via standard NVML fan APIs for discrete cards.
Memory Clock:
Reported as: MEM N/A MHz
Working Metrics
GPU Utilization: Correctly shows load (e.g., 96%).
Power Usage: Correctly shows wattage (e.g., 58 W).
Process List: Correctly identifies processes and their per-process memory usage.
Additional Context
The GB10 uses the Unified Memory architecture (LPDDR5X) shared between CPU and GPU. The standard NVML queries for "FB Memory" might need to be adjusted or replaced with queries for the unified memory pool on this architecture.

Output of nvidia-smi for reference:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|=========================================+========================+======================|
| 0 NVIDIA GB10 On | 0000000F:01:00.0 Off | N/A |
| N/A 32C P8 3W / N/A | Not Supported | 0% Default |
+-----------------------------------------+------------------------+----------------------+
I have attached a screenshot of nvtop running under load to demonstrate the issue.

Screenshot

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions