Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
description: Build and run instructions (from skills.md)
globs: **/build-github-coding-agent-runner.sh,**/run-github-coding-agent-runner.sh,**/start.sh,**/iris.def
alwaysApply: false
---

# Build & Run

## Build (SLURM)

From the `github-runner` directory: `sbatch build-github-coding-agent-runner.sh`.

- Partition: `mi3001x`, time limit: 2 hours.
- Input: `--def=FILE` (default `iris.def`). Output: `--output=FILE` (default `github-copilot-coding-agent-runner.sif`). Same directory.
- Submit from repo dir so `SLURM_SUBMIT_DIR` is correct. Temp/cache under build dir (`.apptainer-tmp`, `.apptainer-cache`); temp removed after success, cache kept for rebuilds.

## Run

After build: (1) Standalone: `./run-github-coding-agent-runner.sh --github-token=... --github-repository=... --script-dir="$(pwd)" --runner-base="$(pwd)/runner-data"`. (2) SLURM: set `GITHUB_TOKEN` and `GITHUB_REPOSITORY`, then `sbatch run-github-coding-agent-runner.sh` (script uses env and SLURM defaults). See README.md for full setup.
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
description: Workflow and conventions for the GitHub Actions runner (from AGENTS.md)
alwaysApply: true
---

# GitHub Runner – Workflow and Conventions

## Workflow

Flow: run-github-coding-agent-runner.sh → container → start.sh → Actions listener. Two run modes: (1) Standalone: `./run-github-coding-agent-runner.sh` with required flags (--github-token, --github-repository, --script-dir, --runner-base). (2) SLURM: set GITHUB_TOKEN and GITHUB_REPOSITORY, then `sbatch run-github-coding-agent-runner.sh`; when under SLURM with no args, the script uses env and SLURM defaults. start.sh installs/configures the runner in RUNNER_HOME and starts the Actions listener.

## Conventions

1. **No sensitive data** – Do not hardcode tokens, passwords, or API keys. Use environment variables (e.g. export GITHUB_TOKEN before running).
2. **No host-specific paths** – Do not add paths like /work1/amd/josantos/... Prefer SCRIPT_DIR with dirname BASH_SOURCE, or GITHUB_WORKSPACE, RUNNER_WORKDIR, RUNNER_BASE, WORK, or relative paths.
3. **Do not edit iris.def unless the user explicitly asks.** Prefer changing start.sh or run-github-coding-agent-runner.sh for runtime behavior.
4. **Use known writable directories** – Prefer GITHUB_WORKSPACE, RUNNER_WORKDIR, RUNNER_BASE for installs and cache. Avoid $HOME, ~, /tmp.

## Directory layout (when running in runner)

| Variable | Use for |
|----------|---------|
| GITHUB_WORKSPACE | Repo checkout; installs, cache, venv |
| RUNNER_WORKDIR | Parent of owner/repo; job work |
| RUNNER_BASE | Runner data root; overlay, .github-runner |
24 changes: 24 additions & 0 deletions .github/scripts/github-runner-files/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Agent instructions

## Workflow

Flow: **run-github-coding-agent-runner.sh** → **container** → **start.sh** → **Actions listener**.

- **Standalone:** run `./run-github-coding-agent-runner.sh` with required flags (`--github-token`, `--github-repository`, `--script-dir`, `--runner-base`). No env needed.
- **SLURM:** set `GITHUB_TOKEN` and `GITHUB_REPOSITORY`, then `sbatch run-github-coding-agent-runner.sh`. When the script runs under SLURM with no arguments, it uses env and SLURM defaults (`SLURM_SUBMIT_DIR`, `WORK`) for script-dir and runner-base. start.sh installs/configures the runner in `RUNNER_HOME` if needed and starts the Actions listener; workflow jobs run in the container.

## Conventions

When editing scripts or config in this project:

1. **Never add sensitive data to scripts or committed files.**
Do not hardcode tokens, passwords, API keys, or other secrets. Use environment variables or a secure mechanism outside the repo (e.g. `export GITHUB_TOKEN` before running).

2. **Never use host-specific absolute paths.**
Do not add paths like `/work1/amd/josantos/...` or other machine-specific directories. Prefer:
- Paths relative to the script (e.g. `SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"` then `cd "${SCRIPT_DIR}"`).
- Environment variables (e.g. `$WORK`, `$HOME`) when a base directory is needed.
- Relative paths from the project or script location.

3. **Never edit the container definition file (e.g. `iris.def`) unless explicitly asked.**
Prefer changing scripts (e.g. `start.sh`, `run-github-coding-agent-runner.sh`) to install, configure, or run things at runtime. Only modify `.def` files when the user explicitly requests it.
214 changes: 214 additions & 0 deletions .github/scripts/github-runner-files/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
# Iris + GitHub Actions Self-Hosted Runner (Apptainer)

This setup runs a GitHub Actions self-hosted runner in an Apptainer container with the Iris framework (ROCm, Triton) and the `copilot` label, for HPC environments where Docker is not available.

## Prerequisites

- Apptainer/Singularity installed
- GitHub Personal Access Token with `repo` scope
- Access to the repository where you want to register the runner
- SLURM (for job scheduling)
- Optional: ROCm/AMD GPU partition for GPU workflows

## Quick Start

### 1. Create GitHub Personal Access Token

1. Go to https://github.com/settings/tokens/new
2. Name: e.g. `GitHub Actions Runner`
3. Scopes: Select `repo` (Full control of private repositories)
4. Click "Generate token" and save it securely

### 2. Prepare token and paths

You will pass the GitHub token and repository as flags (see step 4). Do not commit tokens.

### 3. Build the Container

From this directory:

```bash
sbatch build-github-coding-agent-runner.sh
```

This builds `github-copilot-coding-agent-runner.sif` from `iris.def` by default. To use another definition file: `./build-github-coding-agent-runner.sh --def=my.def` or set `DEF_FILE=my.def` before `sbatch`. The job uses partition `mi3001x` and may take a while. See **skills.md** for full build instructions.

### 4. Run the Runner

After the build completes, from the repo directory (where `run-github-coding-agent-runner.sh` and the `.sif` live). You can run in two ways:

**Option A — Standalone with flags (required when not using SLURM):**

```bash
./run-github-coding-agent-runner.sh \
--github-token='YOUR_GITHUB_TOKEN' \
--github-repository='owner/repo' \
--script-dir="$(pwd)" \
--runner-base="$(pwd)/runner-data"
```

**Option B — Via SLURM with environment variables (when `sbatch run-github-coding-agent-runner.sh` is used, the script uses env and SLURM defaults for any value not passed as a flag):**

```bash
export GITHUB_TOKEN='YOUR_GITHUB_TOKEN'
export GITHUB_REPOSITORY='owner/repo'
sbatch run-github-coding-agent-runner.sh
```

With Option B, `SCRIPT_DIR` defaults to `SLURM_SUBMIT_DIR` (or the script’s directory), and `RUNNER_BASE` defaults to `$WORK/github-runner-data` if `WORK` is set, otherwise `$SCRIPT_DIR/github-runner-data`. You can override with `export SCRIPT_DIR=... RUNNER_BASE=...` if needed.

Copy-paste and replace:
- `YOUR_GITHUB_TOKEN` — your GitHub Personal Access Token
- `owner/repo` — your repository (e.g. `Jose/Iris`)
- `runner-data` (Option A) — directory for runner state and work (created if missing); use any path you prefer.

Optional flags (Option A) or env vars (Option B) (examples):

```bash
--cluster-name='vultr-k8' \ # or export CLUSTER_NAME=...
--runner-labels='copilot,rocm' \
--use-overlay=1
```

### 5. Verify Runner Registration

1. Go to your repository on GitHub
2. Navigate to: Settings → Actions → Runners
3. You should see your runner listed with the `copilot` label

## Using the Runner in Workflows

In your `.github/workflows/*.yml` files, use the runner via the `copilot` label (or whatever you passed to `--runner-labels`). Ensure the workflow’s `runs-on` matches: e.g. `runs-on: copilot` or `runs-on: [self-hosted, copilot]`. If a workflow uses a different label (e.g. `apptainer`), either register the runner with that label too or change the workflow to `copilot`.

```yaml
name: Example Workflow
on: [push]

jobs:
build:
runs-on: copilot
steps:
- uses: actions/checkout@v4
- name: Run a test
run: echo "Running on Iris + copilot runner in HPC!"
```

## Workflow

End-to-end flow when you run the runner via SLURM:

1. **One-time setup**
Create a GitHub PAT with `repo` scope. From this directory, run `sbatch build-github-coding-agent-runner.sh` to build `github-copilot-coding-agent-runner.sif` from `iris.def` (Iris + ROCm; the runner is not in the image).

2. **Run the runner**
Either pass required flags to `run-github-coding-agent-runner.sh` (standalone) or set `GITHUB_TOKEN` and `GITHUB_REPOSITORY` and run `sbatch run-github-coding-agent-runner.sh` (SLURM-only env fallback; see step 4). The script runs Apptainer with overlay and bind mounts and executes `/bin/bash -c "/runner-scripts/start.sh"`. So: **run-github-coding-agent-runner.sh** → **container** → **start.sh**.

3. **Inside the container: start.sh**
It receives `GITHUB_TOKEN`, `GITHUB_REPOSITORY`, `RUNNER_HOME`, `RUNNER_NAME`, `RUNNER_LABELS`, and `RUNNER_WORKDIR` from the run script (via `--env`). It checks required vars, sets defaults for any unset, and uses `RUNNER_HOME` (e.g. `/runner-home`). If the runner is not installed in `RUNNER_HOME`, it installs it (from `/opt/actions-runner` or by download). It fetches a registration token from GitHub, runs `config.sh`, then starts the Actions runner listener (`./run.sh`). The runner listens for jobs; when a workflow uses the `copilot` (or your) label, GitHub sends a job and the runner runs the steps in the container.

4. **End-to-end**
You run **run-github-coding-agent-runner.sh** with `--github-token`, `--github-repository`, `--script-dir`, and `--runner-base` (and optionally `--sif`). **run-github-coding-agent-runner.sh** starts the container, binds the script dir and runner dirs, passes env to the container, and runs **start.sh**. **start.sh** installs/configures the runner if needed and starts the listener. So: **run-github-coding-agent-runner.sh** → **container** → **start.sh** (install/configure + listener) → **runner runs workflow jobs**.

## Management Commands

```bash
# Build container
sbatch build-github-coding-agent-runner.sh

# Run standalone (required flags)
./run-github-coding-agent-runner.sh --github-token='...' --github-repository='owner/repo' --script-dir="$(pwd)" --runner-base="$(pwd)/runner-data"

# Run via SLURM with env (set GITHUB_TOKEN and GITHUB_REPOSITORY; SCRIPT_DIR/RUNNER_BASE default from SLURM)
export GITHUB_TOKEN=... GITHUB_REPOSITORY=owner/repo
sbatch run-github-coding-agent-runner.sh

# Check SLURM job status
squeue -u $USER

# View SLURM job logs
tail -f github-coding-agent-runner-*.out

# Cancel SLURM job
scancel <job_id>
```

## Customization

### Runner Name and Labels

Defaults are set in `run-github-coding-agent-runner.sh` (e.g. runner name: `repo-runner-cluster-YYYYMMDD-HHMMSS`; default label: `copilot`). Override with flags:

```bash
./run-github-coding-agent-runner.sh ... --runner-name='my-runner' --runner-labels='copilot,slurm,apptainer,hpc,iris,rocm,mi300x'
```

### SLURM Parameters

Edit `run-github-coding-agent-runner.sh` SBATCH directives as needed:

- `#SBATCH --time=8:00:00`
- `#SBATCH -p mi3008x` # partition
- `#SBATCH --nodes=1`

GPU access is enabled via `--rocm` in the container run.

### Kubernetes / no overlay

Overlays are not used in Kubernetes (default `USE_OVERLAY=0` in pods). The script uses **bind mounts only** for writable space:

- **RUNNER_HOME** (runner config) and **RUNNER_WORKDIR** (job work) are bind-mounted from the host/pod.
- Optional: set **RUNNER_TMP** to a writable directory (e.g. a pod `emptyDir` mounted in the container) and the script will bind it to `/tmp` inside the container so tools (e.g. Triton cache) can write there.

Example in a pod spec: mount an `emptyDir` at `/runner-tmp` and set `RUNNER_TMP=/runner-tmp` in the container env so `/tmp` is writable without an overlay.

## Troubleshooting

### Runner not appearing in GitHub

1. Check logs: `tail -f github-coding-agent-runner-*.out` and `github-coding-agent-runner-*.err`
2. Verify the token (`--github-token` or `GITHUB_TOKEN`) has `repo` scope
3. Verify `--github-repository` format is `owner/repo`
4. Check token has not expired

### Build failures

- Build runs on partition `mi3001x` with fakeroot. See **skills.md** for details.
- Cache and temp dirs are under the project directory (`.apptainer-cache`, `.apptainer-tmp`). Ensure enough disk space.

### Container not found when running

If the container image is missing (default: `script-dir/github-copilot-coding-agent-runner.sif`), `run-github-coding-agent-runner.sh` will print a message. Run the build and wait for it to complete, or pass `--sif=/path/to/image.sif`.

### Runner offline

```bash
squeue -u $USER
tail -50 github-coding-agent-runner-*.err
scancel <job_id>
# Resubmit: either same flags (standalone) or same env then sbatch run-github-coding-agent-runner.sh
```

## Security

- **Tokens**: Never commit tokens. Use `--github-token=TOKEN` when running standalone, or set `GITHUB_TOKEN` when using `sbatch run-github-coding-agent-runner.sh`; do not put secrets in committed files.
- **Paths**: Do not hardcode host-specific paths in scripts. See **AGENTS.md** for project conventions.
- **Container**: Apptainer runs as your user; the container is read-only with a per-job writable overlay.

## File Structure

```
github-runner/
├── iris.def # Apptainer definition (Iris + ROCm)
├── build-github-coding-agent-runner.sh # SLURM build job (--def=FILE for definition file)
├── run-github-coding-agent-runner.sh # Run job (flags or sbatch + env)
├── start.sh # Runner startup (inside container; also used as K8s entrypoint)
├── runner-container.env.example # Example env file for container (start.sh sources it)
├── AGENTS.md # Agent instructions (no secrets, relative paths)
├── skills.md # Build instructions
├── README.md # This file
└── github-copilot-coding-agent-runner.sif # Built image (after build)
```

## License

MIT License.
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
#!/bin/bash

# SLURM job script to build GitHub Coding Agent Runner container

#SBATCH --job-name=build-github-coding-agent-runner
#SBATCH --output=build-github-coding-agent-runner-%j.out
#SBATCH --error=build-github-coding-agent-runner-%j.err
#SBATCH --time=2:00:00
#SBATCH --nodes=1
#SBATCH -p mi3001x

set -e

# Parse flags for definition file (and optional output)
# Usage: ./build-github-coding-agent-runner.sh [--def=FILE] [--output=SIF]
# or: sbatch build-github-coding-agent-runner.sh (uses DEF_FILE env or default iris.def)
while [[ $# -gt 0 ]]; do
case $1 in
--def=*) DEF_FILE="${1#*=}"; shift ;;
--def) DEF_FILE="${2:-}"; shift 2 ;;
--definition=*) DEF_FILE="${1#*=}"; shift ;;
--definition) DEF_FILE="${2:-}"; shift 2 ;;
-d) DEF_FILE="${2:-}"; shift 2 ;;
--output=*) OUTPUT_SIF="${1#*=}"; shift ;;
--output) OUTPUT_SIF="${2:-}"; shift 2 ;;
-o) OUTPUT_SIF="${2:-}"; shift 2 ;;
-h|--help)
echo "Usage: $0 [OPTIONS]"
echo "Options:"
echo " --def=FILE, --definition=FILE, -d FILE Apptainer definition file (default: iris.def)"
echo " --output=FILE, -o FILE Output .sif file (default: github-copilot-coding-agent-runner.sif)"
exit 0
;;
*) break ;;
esac
done

# Defaults: when under SLURM with no args, use env; else use script default
DEF_FILE="${DEF_FILE:-iris.def}"
OUTPUT_SIF="${OUTPUT_SIF:-github-copilot-coding-agent-runner.sif}"

echo "=========================================="
echo "GitHub Coding Agent Runner Container Build"
echo "=========================================="
echo "Job ID: $SLURM_JOB_ID"
echo "Node: $SLURM_NODELIST"
echo "Start: $(date)"
echo "=========================================="

# Run from script directory so build and def file are in the right place
if [ -n "${SLURM_SUBMIT_DIR}" ]; then
BUILD_DIR="${SLURM_SUBMIT_DIR}"
else
BUILD_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
fi
cd "${BUILD_DIR}"
echo "Build directory: ${BUILD_DIR}"

# Resolve def file path if relative
[ "${DEF_FILE#/}" = "$DEF_FILE" ] && DEF_FILE="${BUILD_DIR}/${DEF_FILE}"
[ "${OUTPUT_SIF#/}" = "$OUTPUT_SIF" ] && OUTPUT_SIF="${BUILD_DIR}/${OUTPUT_SIF}"

if [ ! -f "$DEF_FILE" ]; then
echo "Error: definition file not found: $DEF_FILE"
exit 1
fi

# Temp and cache under build dir (avoids /tmp filling up)
export APPTAINER_TMPDIR="${BUILD_DIR}/.apptainer-tmp"
export APPTAINER_CACHEDIR="${BUILD_DIR}/.apptainer-cache"
mkdir -p "$APPTAINER_TMPDIR" "$APPTAINER_CACHEDIR"

echo ""
echo "=========================================="
echo "Building container image..."
echo "Definition file: $DEF_FILE"
echo "Output file: $OUTPUT_SIF"
echo "=========================================="

apptainer build --force --fakeroot "$OUTPUT_SIF" "$DEF_FILE"

# Clean build temp to free space (cache is kept for faster rebuilds; remove .apptainer-cache to reclaim that too).
rm -rf "$APPTAINER_TMPDIR"
echo "Cleaned temporary directory: $APPTAINER_TMPDIR"

echo ""
echo "=========================================="
echo "Build completed"
echo "=========================================="

echo ""
echo "Finished: $(date)"
Loading
Loading