This document provides a brief intro of the usage of GPU Mounter.
NOTE:
Set environment variable NVIDIA_VISIBLE_DEVICES to tell nvidia-container-runtime add CUDA library for the container, so we can check GPU state by nvidia-smi in the container.
Ensure Pod scheduling to GPU resource nodes by setting affinity or labels.
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
namespace: default
spec:
nodeSelector:
scheduling.device-mounter.io/nvidia_gpu: "true"
containers:
- name: cuda-container
image: tensorflow/tensorflow:1.13.2-gpu
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
env:
- name: NVIDIA_VISIBLE_DEVICES
value: "none"API service, see API_Helper
$ kubectl exec -it gpu-pod -- nvidia-smi -L
No devices found.PUT /apis/device-mounter.io/v1alpha1/namespaces/{namespace}/pods/{name}/mount
View the IP address of the cluster
$ kubectl cluster-info
Kubernetes control plane is running at https://{cluster-ip}:6443
CoreDNS is running at https://{cluster-ip}:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxycurl --location \
--request PUT 'https://{cluster-ip}:6443/apis/device-mounter.io/v1alpha1/namespaces/default/pods/gpu-pod/mount?device_type=NVIDIA_GPU&container=cuda-container&wait_second=30' \
--header 'Authorization: bearer token...' \
--data '{"resources": {"nvidia.com/gpu": "1"}}' check GPU state
$ kubectl exec -it gpu-pod -- nvidia-smi -L
GPU 0: Tesla V100-PCIE-32GB (UUID: GPU-f61ffc1a-9e61-1c0e-2211-4f8f252fe7bc)PUT /apis/device-mounter.io/v1alpha1/namespaces/{namespace}/pods/{name}/unmount
curl --location \
--request POST 'https://{cluster-ip}:6443/apis/device-mounter.io/v1alpha1/namespaces/default/pods/gpu-pod/unmount?device_type=NVIDIA_GPU&container=cuda-container&force=true' \
--header 'Authorization: bearer token...' check GPU state
$ kubectl exec -it gpu-pod -- nvidia-smi -L
No devices found.