A point-in-time capture of the cluster. It records facts like Kubernetes version, OS, GPUs, labels, taints, topology, and runtime state.
+
+
+ Slurm nuance
+ Today's Slinky Slurm leaves are generated from criteria flags, not from snapshot intake. Validation still captures or loads snapshot data for pre-flight checks before running Slurm health checks.
+
+ Presenter line
+ Slurm leaves are generated from criteria flags today. The recipe resolves the Slinky CRDs, operator, cluster chart, GPU GRES settings, constraints, and install order.
+
aicr recipe --service gke --accelerator h100 --intent training --os cos --platform slurm
+
Leaf overlay: h100-gke-cos-training-slurm.
+
+
+
Kind
+
aicr recipe --service kind --accelerator h100 --intent training --platform slurm
+
CPU-only NodeSet path for smoke and CI-style checks.
+
+
+
+ Shared Slinky shape
+ Cloud H100 leaves bake in Gres=gpu:h100:8 and matching nvidia.com/gpu: 8 slurmd limits so srun --gres=gpu:N works after deploy.
+
+ Two-pool shape
+ Without a CPU-worker pool, the Slurm controller, REST API, and login pod are pinned to GPU workers with the same worker tolerations.
+
+
+
+
+
+
+
+
08 · Apply
+
Deploy: install in dependency order
+
+
deploy
+
$ cd bundle
+$ ./deploy.sh
+
+
+
+
+
+ What AICR contributes
+ Deploy order for the Slurm path is explicit: cert-manager, then Slinky CRDs, then the Slinky operator, then the Slinky Slurm cluster chart.
+