|
| 1 | +# Remote GPU Testing for GitHub Actions |
| 2 | + |
| 3 | +Local machine is macOS with no GPUs. To test GitHub Action workflows and GPU code, run them on remote machines via SSH. |
| 4 | + |
| 5 | +## Machines |
| 6 | + |
| 7 | +<!-- Add machines here, e.g.: --> |
| 8 | +<!-- - **gpu-box**: `ssh user@10.0.0.1` --> |
| 9 | +<!-- - **train-server**: `ssh -i ~/.ssh/key user@hostname.com` --> |
| 10 | + |
| 11 | +## How to run remote commands |
| 12 | + |
| 13 | +Never run GPU code locally. Always use this pattern: |
| 14 | + |
| 15 | +1. **Set up a tmux session** on the remote machine (idempotent): |
| 16 | + ```bash |
| 17 | + ssh host "tmux new-session -d -s work || true" |
| 18 | + ``` |
| 19 | + |
| 20 | +2. **Run commands** via send-keys, always tee to a log file: |
| 21 | + ```bash |
| 22 | + ssh host "tmux send-keys -t work 'command 2>&1 | tee /tmp/jobname.log' Enter" |
| 23 | + ``` |
| 24 | + |
| 25 | +3. **Check output** by tailing the log file directly: |
| 26 | + ```bash |
| 27 | + ssh host "tail -100 /tmp/jobname.log" |
| 28 | + ``` |
| 29 | + |
| 30 | +4. **Check if a command is still running**: |
| 31 | + ```bash |
| 32 | + ssh host "pgrep -f command_name" |
| 33 | + ``` |
| 34 | + |
| 35 | +## Testing GitHub Actions locally on a remote GPU machine |
| 36 | + |
| 37 | +To replicate what a GitHub Action workflow does on a remote GPU box: |
| 38 | + |
| 39 | +1. **Push code to the remote machine**: |
| 40 | + ```bash |
| 41 | + rsync -avz --exclude '.git' --exclude '__pycache__' ./ host:/home/user/kernelbot/ |
| 42 | + ``` |
| 43 | + |
| 44 | +2. **Set up the environment** (mirrors the GH Action): |
| 45 | + ```bash |
| 46 | + ssh host "tmux send-keys -t work 'cd /home/user/kernelbot && pip install -r requirements-dev.txt && pip install -e . 2>&1 | tee /tmp/setup.log' Enter" |
| 47 | + ``` |
| 48 | + |
| 49 | +3. **Run the tests**: |
| 50 | + ```bash |
| 51 | + ssh host "tmux send-keys -t work 'cd /home/user/kernelbot && pytest 2>&1 | tee /tmp/tests.log' Enter" |
| 52 | + ``` |
| 53 | + |
| 54 | +4. **Run GPU kernel submissions** (what the bot dispatches to GitHub Actions): |
| 55 | + ```bash |
| 56 | + ssh host "tmux send-keys -t work 'cd /home/user/kernelbot && python src/kernelbot/main.py --debug 2>&1 | tee /tmp/bot.log' Enter" |
| 57 | + ``` |
| 58 | + |
| 59 | +## Rules |
| 60 | + |
| 61 | +- Assume all remote commands are long-running. |
| 62 | +- Always log output to a file with `tee`. Never rely on tmux scrollback. |
| 63 | +- Use direct `ssh host "tail ..."` to read logs, not tmux capture-pane. |
| 64 | +- For file transfers use `scp` or `rsync`. |
| 65 | +- Multiple parallel jobs: use separate tmux windows (`tmux new-window -t work -n jobname`). |
| 66 | +- Keep tmux sessions alive across interactions. Always `|| true` on session creation to avoid errors if it already exists. |
| 67 | +- To run a command in a specific directory, include `cd` in the send-keys. |
| 68 | +- After making local code changes, always `rsync` to the remote machine before re-running tests. |
0 commit comments