From 93b4068cb3a029b42da561b996fa04661cd00bef Mon Sep 17 00:00:00 2001
From: niksirbi <niko.sirbiladze@gmail.com>
Date: Tue, 12 May 2026 14:12:11 +0100
Subject: [PATCH 1/5] Update SLEAP HPC module docs to reflect current cluster
 state

- Use `module avail SLEAP` instead of `module avail` and show realistic output
- Remove outdated legacy module entries (2023, 2024); fold legacy guidance
  into the main note rather than a separate dropdown
- Clarify that older modules are not recommended due to Ubuntu incompatibility
- Update `module list` example to reflect realistic output
- Add local uv install command for SLEAP 1.6.3 to match the cluster module
---
 docs/source/data_analysis/HPC-module-SLEAP.md | 57 +++++++++----------
 1 file changed, 28 insertions(+), 29 deletions(-)

diff --git a/docs/source/data_analysis/HPC-module-SLEAP.md b/docs/source/data_analysis/HPC-module-SLEAP.md
index 3e816ba..b4413e2 100644
--- a/docs/source/data_analysis/HPC-module-SLEAP.md
+++ b/docs/source/data_analysis/HPC-module-SLEAP.md
@@ -46,51 +46,43 @@ $ ssh hpc-gw2
 To learn more about accessing the HPC via SSH, see the [relevant how-to guide](ssh-cluster-target).
 
 ### Access to the SLEAP module
-Once you are on the HPC gateway node, SLEAP should be listed among the available modules when you run `module avail`:
+Once you are on the HPC gateway node, you can see the available SLEAP modules by running `module avail SLEAP`:
 
 ```{code-block} console
-$ module avail
-...
-SLEAP/2024-08-14
-SLEAP/2025-09-30
-SLEAP/2026-05-08
+$ module avail SLEAP
+----------------------- /ceph/apps/ubuntu-24/modulefiles-----------------------
+   ...  SLEAP/2025-09-30    SLEAP/2026-05-08 (D)
+
+  Where:
+   D:  Default Module
 ...
 ```
-- `SLEAP/2024-08-14` corresponds to `SLEAP v.1.3.3` (TensorFlow backend, legacy)
-- `SLEAP/2025-09-30` corresponds to `SLEAP v.1.3.4` (TensorFlow backend, legacy)
 - `SLEAP/2026-05-08` corresponds to `SLEAP v.1.6.3` (PyTorch backend)
+- Older modules correspond to legacy versions of SLEAP (TensorFlow backend).
+  - `SLEAP/2025-09-30` corresponds to `SLEAP v.1.3.4`
+  - SLEAP modules with older dates are no longer recommended for use, as they were built for
+    an older version of Ubuntu than the one currently running on the cluster.
 
-We recommend always using the latest version, which is the one loaded by default
-when you run `module load SLEAP`. If you want to load a specific version,
-you can do so by typing the full module name,
-including the date e.g. `module load SLEAP/2025-09-30`.
 
 ::: {note}
-Starting with `SLEAP/2026-05-08`, all new modules use the
+Starting with `SLEAP/2026-05-08`, all new SLEAP modules use the
 [PyTorch backend](https://docs.sleap.ai/). This guide
-documents the PyTorch-based workflow. If you need to use a legacy
-(TensorFlow) module, refer to the
-[legacy SLEAP documentation](https://legacy.sleap.ai/).
-:::
-
-:::{dropdown} Older legacy modules
-:color: info
-:icon: info
+documents the PyTorch-based workflow, which is the recommended approach
+for all new projects.
 
-The following older modules are also available but are no longer recommended:
-- `SLEAP/2023-03-13` corresponds to `SLEAP v.1.2.9`
-- `SLEAP/2023-08-01` corresponds to `SLEAP v.1.3.1`
-
-These use the TensorFlow backend and reference documentation at <https://legacy.sleap.ai/>.
+If you need to use a legacy (TensorFlow) module,
+e.g. to maintain compatibility with an existing project, make sure to load
+the corresponding module by its full name, e.g. `module load SLEAP/2025-09-30`,
+and refer to the [legacy SLEAP documentation](https://legacy.sleap.ai/).
 :::
 
-If a module has been successfully loaded, it will be listed when you run `module list`,
-along with other modules it may depend on:
+If a module has been successfully loaded, it will be listed among
+other loaded modules when you run `module list`:
 
 ```{code-block} console
 $ module list
 Currently Loaded Modulefiles:
- 1) uv/0.7.13-GCCcore-14.2.0   2) SLEAP/2026-05-08
+...   15) SLEAP/2026-05-08
 ```
 
 If you have troubles with loading the SLEAP module,
@@ -105,6 +97,13 @@ Thus, you also need to install SLEAP on your local PC/laptop.
 We recommend following the official [SLEAP installation guide](https://docs.sleap.ai/latest/installation/).
 To minimise the risk of issues due to incompatibilities between versions, ensure the version of your local installation of SLEAP matches the one you plan to load in the cluster.
 
+For, example, to match the latest SLEAP module at the time of writing (`SLEAP/2026-05-08`),
+you will need to run the following command in your local terminal:
+
+```{code-block} console
+uv tool install --python 3.13 "sleap[nn]==1.6.3" --with "sleap-io==0.7.0" --with "sleap-nn==0.2.0" --torch-backend auto
+```
+
 ### Mount the SWC filesystem on your local PC/laptop
 The rest of this guide assumes that you have mounted the SWC filesystem on your local PC/laptop.
 If you have not done so, please follow the relevant instructions on the

From 8e0c07cd94183e7d41091c89ae56ce5971fb7faa Mon Sep 17 00:00:00 2001
From: niksirbi <niko.sirbiladze@gmail.com>
Date: Tue, 12 May 2026 18:15:55 +0100
Subject: [PATCH 2/5] Update inference batch script and surrounding docs for
 sleap-nn PyTorch CLI

- Replace sleap-nn track with sleap track alias throughout; same for sleap train;
  add a note explaining that sleap-nn train/track are the equivalent long-form aliases
- Add batch_size (-b) argument to the inference script
- Replace the sleap-nn track arguments dropdown with a pointer to
  sleap track --help and the SLEAP tracking docs
- Remove :caption: from all batch script code blocks (was rendering as
  broken hyperlinks); remove :name: anchors and inline filename comments
- Update model paths and directory names to match the PyTorch-era naming
  convention (dated run names e.g. 260512_144511.centroid.n=10)
- Fix cd command in model evaluation to use the actual dated directory name
- Fix labels version references (v001 -> v002) in inference output prose
- Fix 'some the predictions' typo and other minor wording issues
---
 docs/source/data_analysis/HPC-module-SLEAP.md | 299 +++++++++---------
 1 file changed, 158 insertions(+), 141 deletions(-)

diff --git a/docs/source/data_analysis/HPC-module-SLEAP.md b/docs/source/data_analysis/HPC-module-SLEAP.md
index b4413e2..8885f33 100644
--- a/docs/source/data_analysis/HPC-module-SLEAP.md
+++ b/docs/source/data_analysis/HPC-module-SLEAP.md
@@ -137,37 +137,23 @@ can be [viewed via the SLEAP GUI](model-evaluation) on your local SLEAP installa
 
 (prepare-the-training-job)=
 ### Prepare the training job
-Follow the SLEAP instructions for [Creating a Project](https://docs.sleap.ai/latest/tutorials/new-project/)
-and [Initial Labelling](https://docs.sleap.ai/latest/tutorials/initial-labeling/).
+Follow the [SLEAP tutorial](https://docs.sleap.ai/latest/tutorial/overview/) till
+the end of the section on [Initial Labelling](https://docs.sleap.ai/latest/tutorial/initial-labeling/).
 Ensure that the project file (e.g. `labels.v001.slp`) is saved in the mounted SWC filesystem
 (as opposed to your local filesystem).
 
-Next, follow the instructions in [Remote Training](https://docs.sleap.ai/latest/guides/remote/),
+Next, read the [Training a model](https://docs.sleap.ai/latest/tutorial/training-a-model/) section
+of the tutorial, but **do not hit the `Run` button** in the SLEAP GUI just yet
+(that would run the training job on your local machine, which is not what we want).
+Instead, follow the instructions in the [Running SLEAP remotely](https://docs.sleap.ai/latest/guides/running-sleap-remotely/) guide,
 i.e. *Predict* -> *Run Training…* -> *Export Training Job Package…*.
-- For selecting the right configuration parameters, see [Configuring Models](https://docs.sleap.ai/latest/guides/choosing-models/) and [Troubleshooting Workflows](https://docs.sleap.ai/latest/guides/troubleshooting-workflows/)
-- Set the *Predict On* parameter to *nothing*. Remote training and inference (prediction) are easiest to run separately on the HPC Cluster. Also unselect *Visualize Predictions During Training* in training settings, if it's enabled by default.
-- If you are working with camera view from above or below (as opposed to a side view), set the *Rotation Min Angle* and *Rotation Max Angle* to -180 and 180 respectively in the *Augmentation* section.
+
+- For selecting the right configuration parameters, see the [Model Configuration](https://nn.sleap.ai/latest/reference/models/) guide.
+- Set the *Inference Target* parameter to *Nothing*. Remote training and inference (prediction) are easiest to run separately on the HPC Cluster.
+- If you are working with camera view from above or below (as opposed to a side view), set the *Rotation* to ±180° in the *Augmentation* section.
 - Make sure to save the exported training job package (e.g. `labels.v001.slp.training_job.zip`) in the mounted SWC filesystem, for example, in the same directory as the project file.
 - Unzip the training job package. This will create a folder with the same name (minus the `.zip` extension). This folder contains everything needed to run the training job on the HPC cluster: YAML configuration files and a packaged labels file (`.pkg.slp`).
 
-:::{dropdown} Generating configs without the GUI
-:color: info
-:icon: info
-
-If you prefer not to use the GUI for generating training configurations,
-you can use the `sleap-nn config` command on the HPC cluster (after loading
-the SLEAP module) to auto-generate YAML config files from a labels file:
-
-```{code-block} console
-$ sleap-nn config labels.v001.slp --auto -o config.yaml
-```
-
-For top-down models, this will create two config files
-(e.g. `config_centroid.yaml` and `config_centered_instance.yaml`).
-The config generator analyses the data and recommends the pipeline type,
-backbone, and hyperparameters.
-:::
-
 (run-the-training-job)=
 ### Run the training job
 Login to the HPC cluster as described above.
@@ -178,11 +164,14 @@ $ ssh hpc-gw2
 Navigate to the training job folder (replace with your own path) and list its contents:
 ```{code-block} console
 $ cd /ceph/scratch/neuroinformatics-dropoff/SLEAP_HPC_test_data
-$ cd labels.v001.slp.training_job
+$ cd labels.v002.slp.training_job
 $ ls -1
 centered_instance.yaml
 centroid.yaml
-labels.v001.pkg.slp
+inference-script.sh
+jobs.yaml
+labels.v002.pkg.slp
+train-script.sh
 ```
 
 The YAML configuration files specify the model architecture, training hyperparameters,
@@ -196,6 +185,35 @@ the 'Top-Down' configuration, which consists of two neural networks - the first
 for isolating the animal instances (by finding their centroids) and the second
 for predicting all the body parts per instance.
 
+Importantly, SLEAP also gives you a `train-script.sh` file that contains the exact commands needed to run the training job from the unzipped package folder.
+You can inspect this file with `cat train-script.sh`:
+
+```{code-block} bash
+#!/bin/bash
+sleap train --config-name centroid.yaml --config-dir . trainer_config.ckpt_dir='/mnt/Data/sleap-tutorial-data/models' trainer_config.run_name='260512_151547.centroid.n=46'
+sleap train --config-name centered_instance.yaml --config-dir . trainer_config.ckpt_dir='/mnt/Data/sleap-tutorial-data/models' trainer_config.run_name='260512_151547.centered_instance.n=46'
+```
+
+You will need to modify the paths in the `trainer_config.ckpt_dir` argument to point to a directory where you want the trained model files to be saved. You can edit the `train-script.sh` file with `nano` or any text editor of your choice.
+
+In this example, we'll set this path to an appropriate directory in the `ceph` filesystem:
+```{code-block} bash
+:linenos:
+#!/bin/bash
+sleap train --config-name centroid.yaml --config-dir . trainer_config.ckpt_dir='/ceph/scratch/neuroinformatics-dropoff/SLEAP_HPC_test_data/models' trainer_config.run_name='260512_151547.centroid.n=46'
+sleap train --config-name centered_instance.yaml --config-dir . trainer_config.ckpt_dir='/ceph/scratch/neuroinformatics-dropoff/SLEAP_HPC_test_data/models' trainer_config.run_name='260512_151547.centered_instance.n=46'
+```
+
+For a full list of available `sleap train` arguments, run `sleap train --help` (with the SLEAP module loaded)
+and consult the relevant SLEAP-NN documentation on [training](https://nn.sleap.ai/latest/guides/training/).
+
+:::{note}
+`sleap train` and `sleap track` are short aliases for `sleap-nn train` and `sleap-nn track` respectively.
+Both forms work interchangeably.
+:::
+
+In `nano`, you can save the file by pressing `Ctrl+O` and exit by pressing `Ctrl+X`.
+
 ![Top-Down model configuration](https://legacy.sleap.ai/_images/topdown_approach.jpg)
 
 :::{dropdown} More on 'Top-Down' vs 'Bottom-Up' models
@@ -205,23 +223,21 @@ for predicting all the body parts per instance.
 Although the 'Top-Down' configuration was designed with multiple animals in mind,
 it can also be used for single-animal videos. It makes sense to use it for videos
 where the animal occupies a relatively small portion of the frame - see
-[Troubleshooting Workflows](https://docs.sleap.ai/latest/guides/troubleshooting-workflows/) for more info.
+[Model Configuration](https://nn.sleap.ai/latest/reference/models/) for more info.
 :::
 
 Next you need to create a SLURM batch script, which will schedule the training job
-on the HPC cluster. Create a new file called `train_slurm.sh`
+on the HPC cluster. Create a new file called `train-slurm.sh`
 (you can do this in the terminal with `nano`/`vim` or in a text editor of
 your choice on your local PC/laptop). Here we create the script in the same folder
 as the training job, but you can save it anywhere you want, or even keep track of it with `git`.
 
 ```{code-block} console
-$ nano train_slurm.sh
+$ nano train-slurm.sh
 ```
 
 An example is provided below, followed by explanations.
 ```{code-block} bash
-:caption: train_slurm.sh
-:name: train-slurm-sh
 :linenos:
 #!/bin/bash
 
@@ -231,7 +247,7 @@ An example is provided below, followed by explanations.
 #SBATCH --mem 32G # memory pool for all cores
 #SBATCH -n 8 # number of cores
 #SBATCH -t 0-06:00 # time (D-HH:MM)
-#SBATCH --gres gpu:1 # request 1 GPU (of any kind)
+#SBATCH --gres gpu:a100:1 # request 1 GPU of a given type (see dropdown below)
 #SBATCH -o slurm.%x.%N.%j.out # STDOUT
 #SBATCH -e slurm.%x.%N.%j.err # STDERR
 #SBATCH --mail-type=ALL
@@ -245,21 +261,17 @@ module load SLEAP
 
 # Define directories for SLEAP project and exported training job
 SLP_DIR=/ceph/scratch/neuroinformatics-dropoff/SLEAP_HPC_test_data
-SLP_JOB_NAME=labels.v001.slp.training_job
+SLP_JOB_NAME=labels.v002.slp.training_job
 SLP_JOB_DIR=$SLP_DIR/$SLP_JOB_NAME
 
 # Go to the job directory
 cd $SLP_JOB_DIR
 
-# Run the training for each model
-sleap-nn train --config centroid.yaml \
-    "data_config.train_labels_path=[labels.v001.pkg.slp]"
-sleap-nn train --config centered_instance.yaml \
-    "data_config.train_labels_path=[labels.v001.pkg.slp]"
+# Run the train-script.sh generated by SLEAP
+# which we edited to point to the correct checkpoint directory
+./train-script.sh
 ```
 
-In `nano`, you can save the file by pressing `Ctrl+O` and exit by pressing `Ctrl+X`.
-
 :::{dropdown} Explanation of the batch script
 :color: info
 :icon: info
@@ -269,22 +281,38 @@ A primer on the most useful SLURM arguments is provided in this [how-to guide](s
 For more information  see the [SLURM documentation](https://slurm.schedmd.com/sbatch.html).
 
 - The `#` lines are comments. They are not executed by SLURM, but they are useful
-for explaining the script to your future self and others.
-
-- The `nvidia-smi` line prints some information about the GPU(s) available on the node, including their driver version and memory usage. This is useful for debugging purposes.
+  for explaining the script to your future self and others.
+
+- `--gres gpu:a100:1` requests 1 GPU of type A100. If you don't care about the specific
+    GPU type, you can simply request `--gres gpu:1`. You can inspect the available GPU
+    types by listing the nodes in the `gpu` and `gpu_lowp` partitions with `sinfo`:
+    ```{code-block} console
+    $ sinfo -p gpu,gpu_lowp -o "%N %G" --noheader
+    ```
+    In each output line, look for the string between `gpu:` and the next `:` (e.g. `a100` or `l40s`).
+    Avoid GPUs with CUDA compute capability below 7.5, which are no longer supported by recent PyTorch versions (>= 2.5).
+    At the time of writing, only the `p5000` cards are incompatible.
+    Refer to the GPU platform information on the
+    [SWC internal wiki](https://liveuclac.sharepoint.com/sites/SSC/SitePages/SSC-CPU-and-GPU-Platform-architecture-165449857.aspx)
+    and look up a GPU's compute capability at the
+    [NVIDIA CUDA GPUs page](https://developer.nvidia.com/cuda/gpus).
+
+
+- The `nvidia-smi` line prints some information about the GPU(s) available on the node,
+  including their driver version and memory usage.
+  This is useful for debugging purposes.
 
 - The `module load SLEAP` line loads the latest SLEAP module (PyTorch backend)
-and any other modules it may depend on. PyTorch bundles its own CUDA runtime,
-so no separate `cuda` module is needed.
+  and any other modules it may depend on. PyTorch bundles its own CUDA runtime,
+  so no separate `cuda` module is needed.
 
 - The `cd` line changes the working directory to the training job folder.
-This is necessary because the training commands below use relative paths
-to the configuration and labels files.
+  This is necessary because the training commands inside `train-script.sh`
+  use relative paths to the configuration files.
 
-- The `sleap-nn train` commands each train one model. The `--config` flag
-points to the YAML configuration file, and the
-`data_config.train_labels_path=[...]` override ensures the correct path
-to the packaged labels file is used.
+- The `./train-script.sh` line runs the script containing the training commands.
+  Alternatively, you could also type the training commands directly in the
+  SLURM script.
 :::
 
 :::{dropdown} Legacy training commands (TensorFlow modules)
@@ -295,36 +323,33 @@ If you are using a legacy SLEAP module (≤ 1.4.1, TensorFlow backend),
 the training commands use `sleap-train` with JSON config files:
 
 ```{code-block} bash
-sleap-train centroid.json labels.v001.pkg.slp
-sleap-train centered_instance.json labels.v001.pkg.slp
+sleap-train centroid.json labels.v002.pkg.slp
+sleap-train centered_instance.json labels.v002.pkg.slp
 ```
 
 The exported training job package from legacy SLEAP also includes a
 `train-script.sh` that contains these commands, so you can simply run
-`./train-script.sh` from the SLURM script. See the
-[legacy SLEAP documentation](https://legacy.sleap.ai/guides/remote.html#remote-training) for details.
+`./train-script.sh` from the SLURM script. See the legacy SLEAP
+[remote training guide](https://legacy.sleap.ai/guides/remote.html#remote-training)
+and the [legacy CLI reference](https://legacy.sleap.ai/guides/cli.html) for details.
 :::
 
 :::{warning}
 Before submitting the job, ensure that you have permissions to execute
-the batch script.
-You can make this file executable by running in the terminal:
-
-```{code-block} console
-$ chmod +x train_slurm.sh
-```
-
-If the script is not in your working directory, you will need to specify its full path:
+both the SLURM batch script (`train-slurm.sh`) and the
+training commands script (`train-script.sh`).
+You can make these files executable by running in the terminal:
 
 ```{code-block} console
-$ chmod +x /path/to/train_slurm.sh
+$ chmod +x train-slurm.sh
+$ chmod +x train-script.sh
 ```
 :::
 
 Now you can submit the batch script via running the following command
 (in the same directory as the script):
 ```{code-block} console
-$ sbatch train_slurm.sh
+$ sbatch train-slurm.sh
 Submitted batch job 3445652
 ```
 
@@ -392,40 +417,39 @@ $ cat slurm.gpu-sr670-20.3445652.err
 
 If you encounter out-of-memory errors, keep in mind that there two main sources of memory usage:
 - CPU memory (RAM), specified via the `--mem` argument in the SLURM batch script. This is the memory used by the Python process running the training job and is shared among all the CPU cores.
-- GPU memory, this is the memory used by the GPU card(s) and depends on the GPU card type you requested via the `--gres gpu:1` argument in the SLURM batch script. To increase it, you can request a specific GPU card type with more GPU memory (e.g. `--gres gpu:a4500:1`). The SWC wiki provides a [list of all GPU card types and their specifications](https://liveuclac.sharepoint.com/sites/SSC/SitePages/SSC-CPU-and-GPU-Platform-architecture-165449857.aspx).
+- GPU memory, this is the memory used by the GPU card(s) and depends on the GPU card type you requested via the `--gres gpu:1` argument in the SLURM batch script. To increase it, you can request a specific GPU card type with more GPU memory (e.g. `--gres gpu:a100:1`). The SWC wiki provides a [list of all GPU card types and their specifications](https://liveuclac.sharepoint.com/sites/SSC/SitePages/SSC-CPU-and-GPU-Platform-architecture-165449857.aspx).
 - If requesting more memory doesn't help, you can try reducing the size of your SLEAP models. You may tweak the model backbone architecture, or play with *Input scaling*, *Max stride* and *Batch size*. See SLEAP's [documentation](https://docs.sleap.ai/) and [discussion forum](https://github.com/talmolab/sleap/discussions) for more details.
 ```
 
 (model-evaluation)=
 ## Model evaluation
 Upon successful completion of the training job, a `models` folder will have
-been created in the training job directory. It contains one subfolder per
-training run.
+been created in your specified `trainer_config.ckpt_dir`.
+It contains one subfolder per training run.
 
 ```{code-block} console
 $ cd /ceph/scratch/neuroinformatics-dropoff/SLEAP_HPC_test_data
-$ cd labels.v001.slp.training_job
 $ cd models
 $ ls -1
-centroid
-centered_instance
+'260512_144511.centroid.n=10'
+'260512_144548.centered_instance.n=10'
 ```
 
 Each subfolder holds the trained model files (e.g. `best.ckpt`),
 their configurations (`training_config.yaml`) and some evaluation metrics.
 
 ```{code-block} console
-$ cd centered_instance
+$ cd '260512_144548.centered_instance.n=10'
 $ ls -1
 best.ckpt
 initial_config.yaml
-training_config.yaml
 labels_gt.train.0.slp
 labels_gt.val.0.slp
-labels_pr.train.slp
-labels_pr.val.slp
+labels_pr.train.0.slp
+labels_pr.val.0.slp
 metrics.train.0.npz
 metrics.val.0.npz
+training_config.yaml
 training_log.csv
 ```
 The SLEAP GUI on your local machine can be used to quickly evaluate the trained models.
@@ -439,13 +463,12 @@ For more detailed evaluation metrics, you can refer to [SLEAP's model evaluation
 (sleap-inference)=
 ## Model inference
 By inference, we mean using a trained model to predict the labels on new frames/videos.
-SLEAP provides the [`sleap-nn track`](https://docs.sleap.ai/latest/guides/cli/) command line utility for running inference
+SLEAP provides the `sleap track` command line utility for running inference
 on a single video or a folder of videos.
+See the [remote inference guide](https://docs.sleap.ai/latest/guides/running-sleap-remotely/#remote-inference) for more details.
 
-Below is an example SLURM batch script that contains a `sleap-nn track` call.
+Below is an example SLURM batch script that contains a `sleap track` call.
 ```{code-block} bash
-:caption: infer_slurm.sh
-:name: infer-slurm-sh
 :linenos:
 #!/bin/bash
 
@@ -455,7 +478,7 @@ Below is an example SLURM batch script that contains a `sleap-nn track` call.
 #SBATCH --mem 64G # memory pool for all cores
 #SBATCH -n 16 # number of cores
 #SBATCH -t 0-02:00 # time (D-HH:MM)
-#SBATCH --gres gpu:1 # request 1 GPU (of any kind)
+#SBATCH --gres gpu:a100:1 # request 1 GPU of a given type
 #SBATCH -o slurm.%x.%N.%j.out # write STDOUT
 #SBATCH -e slurm.%x.%N.%j.err # write STDERR
 #SBATCH --mail-type=ALL
@@ -467,46 +490,32 @@ nvidia-smi
 # Load the SLEAP module
 module load SLEAP
 
-# Define directories for SLEAP project and exported training job
+# Define directory for SLEAP project
 SLP_DIR=/ceph/scratch/neuroinformatics-dropoff/SLEAP_HPC_test_data
-VIDEO_DIR=$SLP_DIR/videos
-SLP_JOB_NAME=labels.v001.slp.training_job
-SLP_JOB_DIR=$SLP_DIR/$SLP_JOB_NAME
 
-# Go to the job directory
-cd $SLP_JOB_DIR
-# Make a directory to store the predictions
-mkdir -p predictions
+# Make a directory to store the predictions (if it doesn't exist already)
+mkdir -p $SLP_DIR/predictions
 
 # Run the inference command
-sleap-nn track \
-    -i $VIDEO_DIR/M708149_EPM_20200317_165049331-converted.mp4 \
-    -m $SLP_JOB_DIR/models/centroid \
-    -m $SLP_JOB_DIR/models/centered_instance \
+sleap track \
+    -i $SLP_DIR/mice.mp4 \
+    -m $SLP_DIR/models/260512_144511.centroid.n=10 \
+    -m $SLP_DIR/models/260512_144548.centered_instance.n=10 \
     -d auto \
+    -b 4 \
     --tracking \
-    -o predictions/labels.v001.predictions.slp
+    -o $SLP_DIR/predictions/labels.v002.predictions.slp
 ```
 The script is very similar to the training script, with the following differences:
 - The time limit `-t` is set lower, since inference is normally faster than training. This will however depend on the size of the video and the number of models used.
 - The requested number of cores `n` and memory `--mem` are higher. This will depend on the requirements of the specific job you are running. It's best practice to try with a scaled-down version of your data first, to get an idea of the resources needed.
-- You can request a specific GPU type with `--gres gpu:<type>:1` (e.g. `--gres gpu:a4500:1`). The different GPU types vary in GPU memory size and compute capabilities (see [the SWC wiki](https://liveuclac.sharepoint.com/sites/SSC/SitePages/SSC-CPU-and-GPU-Platform-architecture-165449857.aspx)).
-- The `sleap-nn train` calls are replaced by the `sleap-nn track` command.
-- The `\` character is used to split the long `sleap-nn track` command into multiple lines for readability. It is not necessary if the command is written on a single line.
-
-::: {dropdown} Explanation of the sleap-nn track arguments
-:color: info
-:icon: info
+- You can request a specific GPU type with `--gres gpu:<type>:1` (e.g. `--gres gpu:a100:1`). The different GPU types vary in GPU memory size and compute capabilities (see [the SWC wiki](https://liveuclac.sharepoint.com/sites/SSC/SitePages/SSC-CPU-and-GPU-Platform-architecture-165449857.aspx)).
+- The `sleap train` calls are replaced by the `sleap track` command.
+- The `\` character is used to split the long `sleap track` command into multiple lines for readability. It is not necessary if the command is written on a single line.
 
- Some important command line arguments are explained below.
- You can view a full list of the available arguments by running `sleap-nn track --help`.
-- The `-i` option specifies the path to the video file to be processed.
-- The `-m` option is used to specify the path to the trained model directory (or directories). In this example we use the two models that were trained above.
-- The `-d` option specifies the device to use for inference. The `auto` value will automatically select the best available device (GPU if available, otherwise CPU).
-- The `--tracking` flag enables cross-frame tracking of detected instances (animals). Additional tracking parameters like `--tracking_window_size`, `--features`, and `--scoring_method` can be used to fine-tune tracking. See SLEAP's guide on [tracking methods](https://docs.sleap.ai/latest/guides/proofreading/) for more info.
-- The `-o` option is used to specify the path to the output file containing the predictions.
-- The above script will predict all the frames in the video. You may select specific frames via the `--frames` option. For example: `--frames 1-50`.
-:::
+For a full list of available `sleap track` arguments, run `sleap track --help` (with the SLEAP module loaded)
+and consult the relevant SLEAP-NN documentation on [inference](https://nn.sleap.ai/latest/guides/inference/)
+and [tracking](https://nn.sleap.ai/latest/guides/tracking/).
 
 :::{dropdown} Legacy inference commands (TensorFlow modules)
 :color: info
@@ -530,21 +539,30 @@ See the [legacy SLEAP CLI reference](https://legacy.sleap.ai/guides/cli.html) fo
 
 You can submit and monitor the inference job in the same way as the training job.
 ```{code-block} console
-$ sbatch infer_slurm.sh
+$ sbatch infer-slurm.sh
 $ squeue --me
 ```
-Upon completion, a `labels.v001.predictions.slp` file will have been created in the `predictions` directory.
+Upon completion, a `labels.v002.predictions.slp` file will have been created in the `predictions` directory.
 
 You can use the SLEAP GUI on your local machine to load and view the predictions:
-*File* -> *Open Project...* -> select the `labels.v001.predictions.slp` file.
+*File* -> *Open Project...* -> select the `labels.v002.predictions.slp` file.
+
 
 ## The training-inference cycle
+
 Now that you have some predictions, you can keep improving your models by repeating
-the training-inference cycle. The basic steps are:
-- Manually correct some of the predictions: see [Prediction-assisted labeling](https://docs.sleap.ai/latest/tutorials/assisted-labeling/)
-- Merge corrected labels into the initial training set: see [Merging guide](https://docs.sleap.ai/latest/guides/merging/)
-- Save the merged training set as `labels.v002.slp`
-- Export a new training job `labels.v002.slp.training_job` (you may reuse the training configurations from `v001`)
+the training-inference cycle.
+
+This predictions file has the same format as a standard SLEAP project file,
+and you can use the GUI (on your local machine) to manually correct the predictions
+or merge them into an existing SLEAP project.
+
+For example, you can:
+
+- [Manually correct](https://docs.sleap.ai/latest/tutorial/correcting-predictions/) some of the predictions
+- Merge corrected labels into the initial training set (`File` -> `Merge into Project...`).
+- Save the merged training set under a new name, e.g. `labels.v003.slp`
+- Export a new training job `labels.v003.slp.training_job` (you may reuse the training configurations from before)
 - Repeat the training-inference cycle until satisfied
 
 ## Troubleshooting
@@ -566,7 +584,7 @@ $ srun -p gpu --gres=gpu:1 --pty bash -i
 :icon: info
 
 * `-p gpu` requests a node from the 'gpu' partition (queue)
-* `--gres=gpu:1` requests 1 GPU of any kind
+* `--gres=gpu:1` requests 1 GPU of any kind. Use `--gres=gpu:<type>:1` to request a specific GPU type (e.g. `--gres=gpu:a100:1`).
 *  `--pty` is short for 'pseudo-terminal'
 *  The `-i` stands for 'interactive'
 
@@ -578,26 +596,26 @@ First, let's verify that you are indeed on a node equipped with a functional
 GPU, by typing `nvidia-smi`:
 ```{code-block} console
 $ nvidia-smi
-Wed Sep 27 10:34:35 2023
-+-----------------------------------------------------------------------------+
-| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
-|-------------------------------+----------------------+----------------------+
-| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
-| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
-|                               |                      |               MIG M. |
-|===============================+======================+======================|
-|   0  NVIDIA GeForce ...  Off  | 00000000:41:00.0 Off |                  N/A |
-|  0%   42C    P8    22W / 240W |      1MiB /  8192MiB |      0%      Default |
-|                               |                      |                  N/A |
-+-------------------------------+----------------------+----------------------+
-
-+-----------------------------------------------------------------------------+
-| Processes:                                                                  |
-|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
-|        ID   ID                                                   Usage      |
-|=============================================================================|
-|  No running processes found                                                 |
-+-----------------------------------------------------------------------------+
+Tue May 12 17:02:17 2026
++-----------------------------------------------------------------------------------------+
+| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
++-----------------------------------------+------------------------+----------------------+
+| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
+|                                         |                        |               MIG M. |
+|=========================================+========================+======================|
+|   0  Quadro RTX 5000                On  |   00000000:37:00.0 Off |                  Off |
+| 33%   27C    P8             11W /  230W |       1MiB /  16384MiB |      0%      Default |
+|                                         |                        |                  N/A |
++-----------------------------------------+------------------------+----------------------+
+
++-----------------------------------------------------------------------------------------+
+| Processes:                                                                              |
+|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
+|        ID   ID                                                               Usage      |
+|=========================================================================================|
+|  No running processes found                                                             |
++-----------------------------------------------------------------------------------------+
 ```
 Your output should look similar to the above. You will be able to see the GPU
 name, temperature, memory usage, etc. If you see an error message instead,
@@ -614,8 +632,7 @@ the GPU is to run the built-in diagnostic command:
 $ sleap doctor
 ```
 This will print system information, package versions, and confirm whether a GPU
-was detected. Look for a line like `GPU 0: NVIDIA ...` under the `[GPU / CUDA]`
-section and `PyTorch: v... (CUDA ...)` to confirm GPU support.
+was detected. Look for the `[GPU / CUDA]` section to confirm GPU support.
 
 To verify manually via the Python interpreter:
 ```{code-block} console
@@ -628,7 +645,7 @@ $ python
 >>> torch.cuda.is_available()
 True
 >>> torch.cuda.get_device_name(0)
-'NVIDIA A100-SXM4-40GB'
+'Quadro RTX 5000'
 ```
 
 If all is as expected, you can exit the Python interpreter, and then exit the GPU node:

From b839173ba9033f52ae4f31d31566065a8ad71480 Mon Sep 17 00:00:00 2001
From: niksirbi <niko.sirbiladze@gmail.com>
Date: Tue, 12 May 2026 18:45:34 +0100
Subject: [PATCH 3/5] Call sleap train directly in SLURM script instead of
 running train-script.sh

train-script.sh reflects paths from the machine that exported the training job
package and may not work on the cluster. Calling sleap train directly in the
SLURM script is cleaner and avoids a Hydra parse error caused by '=' in the
auto-generated trainer_config.run_name value.

- Replace ./train-script.sh in train-slurm.sh with explicit sleap train calls
  using --config-name, --config-dir, and trainer_config.ckpt_dir
- Simplify train-script.sh prose to describe it as a reference only
- Move the sleap train/sleap-nn train aliases note next to the train-script.sh
  mention, consolidating it with the --help reference
- Update batch script explanation dropdown to describe the sleap train arguments
- Simplify chmod warning to train-slurm.sh only
- Remove Hydra parse error troubleshooting entry (no longer a failure path)
---
 docs/source/data_analysis/HPC-module-SLEAP.md | 80 +++++++------------
 1 file changed, 31 insertions(+), 49 deletions(-)

diff --git a/docs/source/data_analysis/HPC-module-SLEAP.md b/docs/source/data_analysis/HPC-module-SLEAP.md
index 8885f33..d703988 100644
--- a/docs/source/data_analysis/HPC-module-SLEAP.md
+++ b/docs/source/data_analysis/HPC-module-SLEAP.md
@@ -139,7 +139,7 @@ can be [viewed via the SLEAP GUI](model-evaluation) on your local SLEAP installa
 ### Prepare the training job
 Follow the [SLEAP tutorial](https://docs.sleap.ai/latest/tutorial/overview/) till
 the end of the section on [Initial Labelling](https://docs.sleap.ai/latest/tutorial/initial-labeling/).
-Ensure that the project file (e.g. `labels.v001.slp`) is saved in the mounted SWC filesystem
+Ensure that the project file (e.g. `labels.v002.slp`) is saved in the mounted SWC filesystem
 (as opposed to your local filesystem).
 
 Next, read the [Training a model](https://docs.sleap.ai/latest/tutorial/training-a-model/) section
@@ -150,8 +150,7 @@ i.e. *Predict* -> *Run Training…* -> *Export Training Job Package…*.
 
 - For selecting the right configuration parameters, see the [Model Configuration](https://nn.sleap.ai/latest/reference/models/) guide.
 - Set the *Inference Target* parameter to *Nothing*. Remote training and inference (prediction) are easiest to run separately on the HPC Cluster.
-- If you are working with camera view from above or below (as opposed to a side view), set the *Rotation* to ±180° in the *Augmentation* section.
-- Make sure to save the exported training job package (e.g. `labels.v001.slp.training_job.zip`) in the mounted SWC filesystem, for example, in the same directory as the project file.
+- Make sure to save the exported training job package (e.g. `labels.v002.slp.training_job.zip`) in the mounted SWC filesystem, for example, in the same directory as the project file.
 - Unzip the training job package. This will create a folder with the same name (minus the `.zip` extension). This folder contains everything needed to run the training job on the HPC cluster: YAML configuration files and a packaged labels file (`.pkg.slp`).
 
 (run-the-training-job)=
@@ -185,35 +184,6 @@ the 'Top-Down' configuration, which consists of two neural networks - the first
 for isolating the animal instances (by finding their centroids) and the second
 for predicting all the body parts per instance.
 
-Importantly, SLEAP also gives you a `train-script.sh` file that contains the exact commands needed to run the training job from the unzipped package folder.
-You can inspect this file with `cat train-script.sh`:
-
-```{code-block} bash
-#!/bin/bash
-sleap train --config-name centroid.yaml --config-dir . trainer_config.ckpt_dir='/mnt/Data/sleap-tutorial-data/models' trainer_config.run_name='260512_151547.centroid.n=46'
-sleap train --config-name centered_instance.yaml --config-dir . trainer_config.ckpt_dir='/mnt/Data/sleap-tutorial-data/models' trainer_config.run_name='260512_151547.centered_instance.n=46'
-```
-
-You will need to modify the paths in the `trainer_config.ckpt_dir` argument to point to a directory where you want the trained model files to be saved. You can edit the `train-script.sh` file with `nano` or any text editor of your choice.
-
-In this example, we'll set this path to an appropriate directory in the `ceph` filesystem:
-```{code-block} bash
-:linenos:
-#!/bin/bash
-sleap train --config-name centroid.yaml --config-dir . trainer_config.ckpt_dir='/ceph/scratch/neuroinformatics-dropoff/SLEAP_HPC_test_data/models' trainer_config.run_name='260512_151547.centroid.n=46'
-sleap train --config-name centered_instance.yaml --config-dir . trainer_config.ckpt_dir='/ceph/scratch/neuroinformatics-dropoff/SLEAP_HPC_test_data/models' trainer_config.run_name='260512_151547.centered_instance.n=46'
-```
-
-For a full list of available `sleap train` arguments, run `sleap train --help` (with the SLEAP module loaded)
-and consult the relevant SLEAP-NN documentation on [training](https://nn.sleap.ai/latest/guides/training/).
-
-:::{note}
-`sleap train` and `sleap track` are short aliases for `sleap-nn train` and `sleap-nn track` respectively.
-Both forms work interchangeably.
-:::
-
-In `nano`, you can save the file by pressing `Ctrl+O` and exit by pressing `Ctrl+X`.
-
 ![Top-Down model configuration](https://legacy.sleap.ai/_images/topdown_approach.jpg)
 
 :::{dropdown} More on 'Top-Down' vs 'Bottom-Up' models
@@ -226,6 +196,18 @@ where the animal occupies a relatively small portion of the frame - see
 [Model Configuration](https://nn.sleap.ai/latest/reference/models/) for more info.
 :::
 
+SLEAP also generates a `train-script.sh` file in the training job folder.
+You can inspect it with `cat train-script.sh` to see the training commands it contains —
+these are useful as a reference, but they reflect the paths on the machine that
+exported the training job package and may not work as-is on the HPC cluster.
+Instead, we'll write the `sleap train` commands from scratch in the nex step.
+
+:::{note}
+`sleap train` is an alias for `sleap-nn train`. Both forms work interchangeably.
+For a full list of available arguments, run `sleap train --help` (with the SLEAP module loaded)
+or consult the SLEAP-NN documentation on [training](https://nn.sleap.ai/latest/guides/training/).
+:::
+
 Next you need to create a SLURM batch script, which will schedule the training job
 on the HPC cluster. Create a new file called `train-slurm.sh`
 (you can do this in the terminal with `nano`/`vim` or in a text editor of
@@ -267,9 +249,9 @@ SLP_JOB_DIR=$SLP_DIR/$SLP_JOB_NAME
 # Go to the job directory
 cd $SLP_JOB_DIR
 
-# Run the train-script.sh generated by SLEAP
-# which we edited to point to the correct checkpoint directory
-./train-script.sh
+# Run the training for each model
+sleap train --config-name centroid.yaml --config-dir . trainer_config.ckpt_dir="$SLP_DIR/models"
+sleap train --config-name centered_instance.yaml --config-dir . trainer_config.ckpt_dir="$SLP_DIR/models"
 ```
 
 :::{dropdown} Explanation of the batch script
@@ -307,12 +289,12 @@ For more information  see the [SLURM documentation](https://slurm.schedmd.com/sb
   so no separate `cuda` module is needed.
 
 - The `cd` line changes the working directory to the training job folder.
-  This is necessary because the training commands inside `train-script.sh`
-  use relative paths to the configuration files.
+  This is necessary because the `--config-dir .` argument in the `sleap train`
+  commands uses a relative path to find the YAML configuration files.
 
-- The `./train-script.sh` line runs the script containing the training commands.
-  Alternatively, you could also type the training commands directly in the
-  SLURM script.
+- The `sleap train` commands each train one model. `--config-name` specifies the
+  YAML file, `--config-dir` the directory to find it in, and
+  `trainer_config.ckpt_dir` sets where the trained model files will be saved.
 :::
 
 :::{dropdown} Legacy training commands (TensorFlow modules)
@@ -336,13 +318,10 @@ and the [legacy CLI reference](https://legacy.sleap.ai/guides/cli.html) for deta
 
 :::{warning}
 Before submitting the job, ensure that you have permissions to execute
-both the SLURM batch script (`train-slurm.sh`) and the
-training commands script (`train-script.sh`).
-You can make these files executable by running in the terminal:
+the SLURM batch script. You can make it executable by running:
 
 ```{code-block} console
 $ chmod +x train-slurm.sh
-$ chmod +x train-script.sh
 ```
 :::
 
@@ -467,6 +446,13 @@ SLEAP provides the `sleap track` command line utility for running inference
 on a single video or a folder of videos.
 See the [remote inference guide](https://docs.sleap.ai/latest/guides/running-sleap-remotely/#remote-inference) for more details.
 
+:::{note}
+`sleap track` is an alias for `sleap-nn track`. Both forms work interchangeably.
+For a full list of available arguments, run `sleap track --help` (with the SLEAP module loaded)
+or consult the relevant SLEAP-NN documentation on [inference](https://nn.sleap.ai/latest/guides/inference/)
+and [tracking](https://nn.sleap.ai/latest/guides/tracking/).
+:::
+
 Below is an example SLURM batch script that contains a `sleap track` call.
 ```{code-block} bash
 :linenos:
@@ -513,10 +499,6 @@ The script is very similar to the training script, with the following difference
 - The `sleap train` calls are replaced by the `sleap track` command.
 - The `\` character is used to split the long `sleap track` command into multiple lines for readability. It is not necessary if the command is written on a single line.
 
-For a full list of available `sleap track` arguments, run `sleap track --help` (with the SLEAP module loaded)
-and consult the relevant SLEAP-NN documentation on [inference](https://nn.sleap.ai/latest/guides/inference/)
-and [tracking](https://nn.sleap.ai/latest/guides/tracking/).
-
 :::{dropdown} Legacy inference commands (TensorFlow modules)
 :color: info
 :icon: info
@@ -560,7 +542,7 @@ or merge them into an existing SLEAP project.
 For example, you can:
 
 - [Manually correct](https://docs.sleap.ai/latest/tutorial/correcting-predictions/) some of the predictions
-- Merge corrected labels into the initial training set (`File` -> `Merge into Project...`).
+- Merge corrected labels into the initial training set (*File* -> *Merge into Project...*).
 - Save the merged training set under a new name, e.g. `labels.v003.slp`
 - Export a new training job `labels.v003.slp.training_job` (you may reuse the training configurations from before)
 - Repeat the training-inference cycle until satisfied

From d8adc3da49f90d5030f5c8fea45335ec44d1b205 Mon Sep 17 00:00:00 2001
From: niksirbi <niko.sirbiladze@gmail.com>
Date: Tue, 12 May 2026 18:56:28 +0100
Subject: [PATCH 4/5] replaced trained model names

---
 docs/source/data_analysis/HPC-module-SLEAP.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/source/data_analysis/HPC-module-SLEAP.md b/docs/source/data_analysis/HPC-module-SLEAP.md
index d703988..d2e3377 100644
--- a/docs/source/data_analysis/HPC-module-SLEAP.md
+++ b/docs/source/data_analysis/HPC-module-SLEAP.md
@@ -410,15 +410,15 @@ It contains one subfolder per training run.
 $ cd /ceph/scratch/neuroinformatics-dropoff/SLEAP_HPC_test_data
 $ cd models
 $ ls -1
-'260512_144511.centroid.n=10'
-'260512_144548.centered_instance.n=10'
+'260512_151547.centroid.n=46'
+'260512_151547.centered_instance.n=46'
 ```
 
 Each subfolder holds the trained model files (e.g. `best.ckpt`),
 their configurations (`training_config.yaml`) and some evaluation metrics.
 
 ```{code-block} console
-$ cd '260512_144548.centered_instance.n=10'
+$ cd '260512_151547.centroid.n=46'
 $ ls -1
 best.ckpt
 initial_config.yaml
@@ -485,8 +485,8 @@ mkdir -p $SLP_DIR/predictions
 # Run the inference command
 sleap track \
     -i $SLP_DIR/mice.mp4 \
-    -m $SLP_DIR/models/260512_144511.centroid.n=10 \
-    -m $SLP_DIR/models/260512_144548.centered_instance.n=10 \
+    -m $SLP_DIR/models/260512_151547.centroid.n=46 \
+    -m $SLP_DIR/models/260512_151547.centered_instance.n=46 \
     -d auto \
     -b 4 \
     --tracking \

From 106eadbf8d7b55dbbb175d05891d328e29dc7e85 Mon Sep 17 00:00:00 2001
From: niksirbi <niko.sirbiladze@gmail.com>
Date: Tue, 12 May 2026 19:05:13 +0100
Subject: [PATCH 5/5] Fix SLURM scripts: replace -n with --ntasks-per-node and
 --cpus-per-task

PyTorch Lightning (used internally by SLEAP) raises a RuntimeError if --ntasks
(i.e. -n) is set in the SLURM script, requiring --ntasks-per-node instead.
The original -n comment ('number of cores') was also misleading, as -n sets
the number of processes (tasks), not CPU cores.

- Replace -n in both training and inference SLURM scripts with
  --ntasks-per-node=1 and --cpus-per-task
- Add rationale in the batch script explanation dropdown
- Update inference script diff explanation to mention --cpus-per-task
- Add troubleshooting entry for the RuntimeError: --ntasks is not supported
---
 docs/source/data_analysis/HPC-module-SLEAP.md | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/docs/source/data_analysis/HPC-module-SLEAP.md b/docs/source/data_analysis/HPC-module-SLEAP.md
index d2e3377..1d07569 100644
--- a/docs/source/data_analysis/HPC-module-SLEAP.md
+++ b/docs/source/data_analysis/HPC-module-SLEAP.md
@@ -227,7 +227,8 @@ An example is provided below, followed by explanations.
 #SBATCH -p gpu # partition (queue)
 #SBATCH -N 1   # number of nodes
 #SBATCH --mem 32G # memory pool for all cores
-#SBATCH -n 8 # number of cores
+#SBATCH --ntasks-per-node=1 # one process per node
+#SBATCH --cpus-per-task=8   # CPU cores available to the process
 #SBATCH -t 0-06:00 # time (D-HH:MM)
 #SBATCH --gres gpu:a100:1 # request 1 GPU of a given type (see dropdown below)
 #SBATCH -o slurm.%x.%N.%j.out # STDOUT
@@ -265,6 +266,12 @@ For more information  see the [SLURM documentation](https://slurm.schedmd.com/sb
 - The `#` lines are comments. They are not executed by SLURM, but they are useful
   for explaining the script to your future self and others.
 
+- `--ntasks-per-node=1` tells SLURM to launch one process per node. PyTorch Lightning
+  (which SLEAP uses internally) requires this form rather than `--ntasks` or `-n`.
+  Lightning then manages GPU parallelism internally within that single process.
+  `--cpus-per-task=8` allocates 8 CPU cores to that process,
+  which are used for data loading and preprocessing.
+
 - `--gres gpu:a100:1` requests 1 GPU of type A100. If you don't care about the specific
     GPU type, you can simply request `--gres gpu:1`. You can inspect the available GPU
     types by listing the nodes in the `gpu` and `gpu_lowp` partitions with `sinfo`:
@@ -462,7 +469,8 @@ Below is an example SLURM batch script that contains a `sleap track` call.
 #SBATCH -p gpu # partition
 #SBATCH -N 1   # number of nodes
 #SBATCH --mem 64G # memory pool for all cores
-#SBATCH -n 16 # number of cores
+#SBATCH --ntasks-per-node=1 # one process per node
+#SBATCH --cpus-per-task=16  # CPU cores available to the process
 #SBATCH -t 0-02:00 # time (D-HH:MM)
 #SBATCH --gres gpu:a100:1 # request 1 GPU of a given type
 #SBATCH -o slurm.%x.%N.%j.out # write STDOUT
@@ -494,7 +502,7 @@ sleap track \
 ```
 The script is very similar to the training script, with the following differences:
 - The time limit `-t` is set lower, since inference is normally faster than training. This will however depend on the size of the video and the number of models used.
-- The requested number of cores `n` and memory `--mem` are higher. This will depend on the requirements of the specific job you are running. It's best practice to try with a scaled-down version of your data first, to get an idea of the resources needed.
+- The requested `--cpus-per-task` and `--mem` are higher. This will depend on the requirements of the specific job you are running. It's best practice to try with a scaled-down version of your data first, to get an idea of the resources needed.
 - You can request a specific GPU type with `--gres gpu:<type>:1` (e.g. `--gres gpu:a100:1`). The different GPU types vary in GPU memory size and compute capabilities (see [the SWC wiki](https://liveuclac.sharepoint.com/sites/SSC/SitePages/SSC-CPU-and-GPU-Platform-architecture-165449857.aspx)).
 - The `sleap train` calls are replaced by the `sleap track` command.
 - The `\` character is used to split the long `sleap track` command into multiple lines for readability. It is not necessary if the command is written on a single line.