Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 47 additions & 50 deletions docs/source/programming/SSH-SWC-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,6 @@
This guide explains how to connect to the SWC's HPC cluster via SSH from
any personal computer.

If you have access to a desktop managed by the SWC's IT team
the connection is much more straightforward than described here
(see the [note on managed desktops](ssh-managed-target)).

```{include} ../_static/swc-wiki-warning.md
```

Expand Down Expand Up @@ -41,13 +37,7 @@ You have two options on how to proceed:

2. If you are using Windows 10 or newer, you can follow this guide (except for the section on [SSH keys](#ssh-keys)) using native Windows functionalities as described here.


To [Log into the cluster](#log-into-the-cluster), you can use the same commands as in the guide below, but typed in the Windows `cmd`:
```{code-block} console
:caption: cmd
ssh <SWC-USERNAME>@ssh.swc.ucl.ac.uk
ssh hpc-gw2
```
To [Log into the cluster](#log-into-the-cluster), you can use the same commands as in the guide below, but typed in the Windows `cmd` or PowerShell.

The [SSH config file](#ssh-config-file) section can be followed using the file browser and Notepad, instead of the terminal and `nano`.
Create the `.ssh` folder in you home directory, i.e. `C:\Users\<USERNAME>\.ssh`,
Expand All @@ -58,54 +48,66 @@ You have two options on how to proceed:
(click 'View' in the file browser and check the box 'File name extensions').
The `config` file contents should be the same as in the guide below.

In day-to-day use, you can use the `ssh swc-gateway` and `ssh swc-bastion` commands natively in Windows `cmd`,
provided that you have defined those aliases in your `config` file, as this guide describes.
In day-to-day use, you can use the `ssh swc-gateway` command natively in Windows `cmd`,
provided that you have defined the alias in your `config` file, as this guide describes.
:::

## Log into the cluster
Run the following commands on the terminal, typing your `<SWC-PASSWORD>` both times when prompted
(your password will not be displayed on the screen):

Run the following commands on the terminal, typing your `<SWC-PASSWORD>` when prompted.
Note that the password will not be displayed on the screen as you type.

If you are physically **at SWC** using a **wired network connection** (i.e., not eduroam)—or
connected using the **SWC VPN**, you can directly connect to the cluster's *gateway* node (`hpc-gw2`).

```{code-block} console
$ ssh <SWC-USERNAME>@hpc-gw2
```

In any other scenario, you are **not within the SWC network**; you must first connect to a secure access point (called the *bastion* node) before you can reach the cluster's *gateway* node (`hpc-gw2`).

```{code-block} console
$ ssh <SWC-USERNAME>@ssh.swc.ucl.ac.uk
$ ssh hpc-gw2
```
You have now successfully logged into the cluster 🎉. You may stop reading here, but...

::: {note}
If you want to learn more about why we had to SSH twice, read the [next section](#why-do-we-ssh-twice).
To return to the computer you came from, simply type `logout`. You can think of `logout` as undoing the last `ssh` command you ran.

:::{admonition} You may stop reading here, but...

If you want to learn more about the various types of HPC nodes (*bastion / gateway / compute*), read the [next section](#types-of-hpc-nodes).

If you want to make you life easier, you can set yourself up with an [SSH config file](#ssh-config-file)
and some [SSH keys](#ssh-keys).
:::

## Why do we SSH twice?
We first need to distinguish the different types of nodes on the SWC HPC system:

- the *bastion* node (or *login node*) - `ssh.swc.ucl.ac.uk`. This serves as a single entry point to the cluster from external networks. By funneling all external SSH connections through this node, it's easier to monitor, log, and control access, reducing the attack surface. The *bastion* node has very little processing power. It can be used to submit and monitor SLURM jobs, but it shouldn't be used for anything else.
- the *gateway* node - `hpc-gw2`. This is a more powerful machine and can be used for light processing, such as editing your scripts, creating and copying files etc. However don't use it for anything computationally intensive, since this node's resources are shared across all users.
- the *compute* nodes - `enc1-node10`, `gpu-sr670-21`, etc. These are the machinces that actually run the jobs we submit, either interactively via `srun` or via batch scripts submitted with `sbatch`.
## Types of HPC nodes

![](../_static/ssh_flowchart_unmanaged.png)
A *node* is simply a computer that is part of the cluster.
Let's distinguish the different types of nodes on the SWC HPC system.

Your home directory, as well as the locations where filesystems like `ceph` are mounted, are shared across all of the nodes.
| Node Type | Hostname | Analogy / Role | Use cases |
| :--- | :--- | :--- | :--- |
| *Bastion* | `ssh.swc.ucl.ac.uk` | Secure entry point | **Strictly pass-through.** Do not run any computations or file operations. If you find yourself here, just type `ssh hpc-gw2` to reach the *gateway* node. |
| *Gateway* | `hpc-gw2` | Staging Area | **Light-weight tasks only** (script editing, file management, job submission). Do not run computations. |
| *Compute* | `enc1-node10`, `gpu-sr670-21`, etc. | Workhorses | **Run the actual computations** submitted via `srun` or `sbatch`. |

The first `ssh` command - `ssh <SWC-USERNAME>@ssh.swc.ucl.ac.uk` only takes you to the *bastion* node. A second command - `ssh hpc-gw2` - is needed to reach the *gateway* node.
![](../_static/ssh_flowchart_unmanaged.png)

Similarly, if you are on the *gateway* node, typing `logout` once will only get you one layer outo the *bastion* node. You need to type `logout` again to exit the *bastion* node and return to your local machine.
Your home directory, as well as the locations where filesystems like `ceph` are mounted, are shared across all of the nodes.

The *compute* nodes should only be accessed via the SLURM `srun` or `sbatch` commands. This can be done from either the *bastion* or the *gateway* nodes. If you are running an interactive job on one of the *compute* nodes, you can terminate it by typing `exit`. This will return you to the node from which you entered.
The *compute* nodes should only be accessed via the SLURM `srun` or `sbatch` commands from the *gateway* node.

:::{dropdown} Be mindful of node usage
:::{dropdown} Be mindful of using shared nodes
:color: warning
:icon: alert

Avoid running heavy computations on the *bastion* or *gateway* nodes, as
they are meant for light tasks like text editing or job submissions to SLURM.
Avoid running heavy computations on the *bastion* or *gateway* nodes, as these are shared across all users of the HPC cluster.

It's always safer to request dedicated *compute* resources, which will be yours for the duration of your job.

For quick tasks that may burden these nodes,
request an interactive session on a *compute* node using the `srun` command.
Here's an example for creating a new conda environment:
For example, this is how you can request an an interactive session on a *compute* node to create a new conda environment:

```{code-block} console
$ srun -p cpu -n 4 --mem 8G --pty bash -i
Expand All @@ -119,26 +121,23 @@ Depending on your needs and node availability, you may need to request
a different partition. See the [SLURM arguments primer](slurm-arguments-target)
for more information.

The `--pty bash -i` part specifies
an interactive bash shell. The following two commands are run in this shell,
on the assigned *compute* node.
The `--pty bash -i` part specifies an interactive bash shell.
The following two commands are run in this shell, on the assigned *compute* node.

Type `exit` to leave the interactive session when finished.
Avoid keeping sessions open when not in use.
:::

(ssh-managed-target)=
(target-managed-desktops)=
## Note on managed desktops

The SWC's IT team offers managed desktop computers equipped with either
a Windows or a Linux image. These machines are already part of the SWC's
trusted network domain, meaning you can access the HPC cluster without
having to go through the *bastion* node.
trusted network domain, meaning you can easily access the HPC cluster.

- If you are using a [managed Windows desktop](https://liveuclac.sharepoint.com/sites/SSC/SitePages/SSC-SWC-Desktops-147956857.aspx),
you can SSH directly into the *gateway* node with `ssh hpc-gw2` from the
Windows `cmd` or PowerShell.
You may use that node to prepare your scripts and submit SLURM jobs.
Windows `cmd` or PowerShell. You may use that node to prepare your scripts and submit SLURM jobs.
- If you are using a [managed Linux desktop](https://liveuclac.sharepoint.com/sites/SSC/SitePages/SSC-Managed-Linux-Desktop-69502751.aspx),
you can even bypass the *gateway* node. In fact, you may directly submit SLURM jobs
from your terminal, without having to SSH at all. That's because managed Linux desktops
Expand All @@ -151,7 +150,7 @@ A modified version of the flowchart found above, including managed desktops:


## SSH config file
If you are frequently accessing the cluster from an unmanaged machine,
If you are frequently accessing the cluster from outside the SWC network,
you may find yourself typing the same SSH commands over and over again.
You can make your life easier by editing the SSH config file.
This is a text file that lives in your home directory and contains
Expand Down Expand Up @@ -215,7 +214,7 @@ upfront, but it will save you tons of time in the long run. Plus, it's more secu
::: {dropdown} How does SSH key authentication work?
:color: info
:icon: info
You generate a pair of keys locally - a public and a private one -
You generate a pair of keys locallya public and a private one
and then copy the public key to the remote machine.
When you try to SSH into the remote machine, the SSH client on your local machine
will use the private key to generate a signature, which the SSH server on the
Expand Down Expand Up @@ -254,7 +253,9 @@ The `id_ed25519` file is your private key and **it should never be shared with a

The `id_ed25519.pub` file is your public key.

::: {warning}
:::{dropdown} When to specify private key location in the SSH config
:color: warning
:icon: alert
In most cases, you don't need to explicitly specify the location of the private key
in your `~/.ssh/config` file because SSH will automatically look for the default key names
(like `id_rsa`, `id_ed25519`, etc.) in the `~/.ssh` directory.
Expand Down Expand Up @@ -303,15 +304,11 @@ in the `config` file to figure out how to reach the remote machine.
It copies the specified public key to your home directory on the target machine (in this case `swc-gateway`) and adds it to the `.ssh/authorized_keys` file there.

Since your SWC home directory is shared across all HPC nodes, the public
key will be available on all of them. That's why you only need to run the above command once, with either `swc-bastion` or `swc-gateway` as the target.
key will be available on all of them. That's why you only need to run the above command once.
:::


🎉 Congrats! You can now directly SSH into the *gateway* node without typing your password:
```{code-block} console
$ ssh swc-gateway
```
In case you want to SSH into the *bastion* node, you can do so by typing:
```{code-block} console
$ ssh swc-bastion
```
Loading