GPU Profile Metrics Exporter

A lightweight Prometheus exporter that collects NVIDIA GPU performance metrics using Nsight Systems (nsys) and exposes them via an HTTP endpoint.
The tool can either:

Run live GPU profiling with nsys profile at a defined interval or
Parse existing Nsight Systems .sqlite reports.

This allows you to continuously monitor GPU utilization, throughput, and other hardware metrics in Prometheus and visualize them in Grafana.

Features

Automated GPU metric collection using nsys
Supports both static .sqlite files and live profiling commands
Exposes metrics in Prometheus format
Configurable collection interval and metric selection
Easy to deploy with Docker

Requirements

NVIDIA GPU
Nsight Systems CLI (nsys)
The Docker image installs this automatically.
Python 3.8+
Prometheus client library

Run with Docker

1. Build the image

docker build -t gpu-metrics-exporter .

2. Run container

This will continuously run an nsys profile for your command and export metrics to Prometheus.

docker run --gpus all -p 9401:9401 \
  -e COMMAND="echo" \
  -e OUTPUT="report" \
  -e METRIC="SMs Active,GR Active" \
  -e INTERVAL=60 \
  -e PORT=9401 \
  gpu-metrics-exporter

Metrics will be available at

http://localhost:9401/metrics

Environment Variables

Variable	Description	Default
`COMMAND`	Command to profile with Nsight Systems (used when `SQLITE` is not set)	`echo`
`OUTPUT`	Output base name for generated `.nsys-rep` and `.sqlite` files	`report`
`SQLITE`	Path to an existing Nsight Systems SQLite file (skips live profiling)	(none)
`METRIC`	Comma-separated list of metric names to export (partial match supported)	`SMs Active,GR Active`
`PORT`	HTTP port for Prometheus metrics	`9401`
`INTERVAL`	Seconds between metric collection	`60`

Example Prometheus Metric Output

# HELP sms_active SMs Active [Throughput %]
# TYPE sms_active gauge
sms_active{gpu="0",modelName="NVIDIA RTX A6000",pci_bus_id="0000:41:00.0",UUID="GPU-xxxx"} 87.5

# HELP gr_active GR Active [Throughput %]
# TYPE gr_active gauge
gr_active{gpu="0",modelName="NVIDIA RTX A6000",pci_bus_id="0000:41:00.0",UUID="GPU-xxxx"} 75.3

Run with Kubernetes

In order to run in Kubernetes environment, the container need to use privileged mode and request resources nvidia.com/gpu

For example:

      containers:
      - name: gpu-metrics
        image: gpu-metrics-exporter
        securityContext:
          privileged: true
        resources:
          limits:
            nvidia.com/gpu: 1

Because it will consume 1 GPU resource => Should be run in shared GPU env only.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Dockerfile		Dockerfile
README.md		README.md
gpu_metrics.py		gpu_metrics.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU Profile Metrics Exporter

Features

Requirements

Run with Docker

1. Build the image

2. Run container

Run with Kubernetes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPU Profile Metrics Exporter

Features

Requirements

Run with Docker

1. Build the image

2. Run container

Run with Kubernetes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages