Scalable S3βS3 Transfers with rclone, Slurm, and pyxis
xfer is a command-line tool for orchestrating large-scale S3 data transfers on HPC and cloud clusters using:
- rclone (inside a container)
- Slurm job arrays
- enroot + pyxis
- manifest-based sharding for reliability and resumability
It is designed for datasets with:
- Millions of objects
- Highly variable object sizes
- Long-running transfers that need retries, logging, and restartability
- π Stable JSONL manifest (
xfer.manifest.v1) - π§© Byte-balanced sharding to avoid long-tail array tasks
- π Automatic retries with Slurm requeue
- β Skip-if-done semantics per shard
- π¦ Containerized rclone (no host rclone dependency)
- βοΈ Configurable topology (array size, concurrency, CPU/mem)
- π§ͺ Safe to re-run: idempotent by design
- Slurm
- enroot + pyxis enabled (
srun --container-imageworks) - Network access from compute nodes to both S3 endpoints
- Python β₯ 3.10
uv- An rclone config file (
rclone.conf) with your S3 remotes
Clone the repository:
git clone https://github.com/fluidnumerics/xfer.git ~/xfer
cd ~/xferCreate and sync the virtual environment:
uv venv
uv syncRun the CLI (no install required):
uv run xfer --helpOptional: install editable for convenience
uv pip install -e .
xfer --helpYou must have an rclone config on the submit host, e.g.:
[s3src]
type = s3
provider = Other
endpoint = https://objects.source.example.com
access_key_id = ...
secret_access_key = ...
[s3dst]
type = s3
provider = Other
endpoint = https://objects.dest.example.com
access_key_id = ...
secret_access_key = ...This file is mounted read-only into the container at runtime.
This builds the manifest, shards it, renders Slurm scripts, and submits the job:
uv run xfer run \
--run-dir run_001 \
--source s3src:mybucket/dataset \
--dest s3dst:mybucket/dataset \
--num-shards 512 \
--array-concurrency 96 \
--rclone-image rclone/rclone:latest \
--rclone-config ~/.config/rclone/rclone.conf \
--rclone-flags "--transfers 48 --checkers 96 --fast-list --stats 30s" \
--partition transfer \
--cpus-per-task 4 \
--mem 8G \
--submitThis will submit a Slurm array job immediately.
Lists all objects using rclone lsjson (inside a container) and writes a stable JSONL manifest.
uv run xfer manifest build \
--source s3src:mybucket/dataset \
--dest s3dst:mybucket/dataset \
--out run/manifest.jsonl \
--rclone-image rclone/rclone:latest \
--rclone-config ~/.config/rclone/rclone.conf \
--extra-lsjson-flags "--fast-list"Output:
run/
manifest.jsonl
Splits the manifest into balanced shards (by total bytes):
uv run xfer manifest shard \
--in run/manifest.jsonl \
--outdir run/shards \
--num-shards 512Output:
run/
shards/
shard_000000.jsonl
shard_000001.jsonl
...
shards.meta.json
Creates:
worker.shβ executed by each array tasksbatch_array.shβ Slurm submission scriptsubmit.shβ convenience wrapperconfig.resolved.jsonβ frozen run configuration
uv run xfer slurm render \
--run-dir run \
--num-shards 512 \
--array-concurrency 96 \
--job-name s3-xfer \
--time-limit 24:00:00 \
--partition transfer \
--cpus-per-task 4 \
--mem 8G \
--rclone-image rclone/rclone:latest \
--rclone-config ~/.config/rclone/rclone.confuv run xfer slurm submit --run-dir runsqueue -j <jobid>
sacct -j <jobid> --format=JobID,State,ElapsedIf you're seeing connection refused errors or a high number of retries for copying, you can back off on the number of simultaneous array tasks by lowering the ArrayTaskThrottle
scontrol update ArrayTaskThrottle=<new array concurrency> JobId=<jobid>run/
logs/
shard_12_attempt_1.log
shard_12_attempt_2.log
run/
state/
shard_12.done
shard_47.fail
.doneβ shard completed successfully.failβ last attempt failed.attemptβ retry counter
- Failed shards are automatically requeued up to
MAX_ATTEMPTS - Completed shards are skipped on re-run
- You can safely re-submit the same array job
To re-submit manually:
sbatch run/sbatch_array.sh--transfers 32
--checkers 64
--fast-list
--retries 10
--low-level-retries 20
--stats 30s
--transfers 16
--checkers 128
--fast-list
--progress --stats 600s
run/
manifest.jsonl
shards/
shard_000123.jsonl
shards.meta.json
logs/
state/
worker.sh
sbatch_array.sh
submit.sh
config.resolved.json
This repo ships a set of Claude Code skills under .claude/skills/ that walk through each stage of a transfer. Each skill drives the corresponding xfer subcommand and encodes conventions we've found important on real clusters.
Invoke each by intent in a Claude Code session, or explicitly via /<skill-name>.
Main pipeline (build β analyze β shard β (rebase) β render β submit):
| Skill | Stage |
|---|---|
xfer-manifest-build |
Run xfer manifest build on a login node (POSIX source preferred) |
xfer-manifest-analyze |
File-size histogram β suggested rclone flags and shard count |
xfer-manifest-shard |
Byte-balanced split of the manifest into shards |
xfer-manifest-rebase |
Remap source/dest roots when the transfer host's view differs |
xfer-slurm-render |
Render worker.sh / sbatch_array.sh / config.resolved.json |
xfer-slurm-submit |
Stage the run directory to the cluster and sbatch |
On-demand / alternative entry points:
| Skill | When to use |
|---|---|
xfer-rclone-config |
One-time (per cluster) setup β create/deploy rclone.conf |
xfer-manifest-combine |
Combine parallel rclone lsjson parts into a single manifest |
xfer-oneshot |
xfer run escape hatch for small transfers that don't need the staged knobs |
See CLAUDE.md for the cross-cutting context Claude loads in every session in this repo.
The skills (and CLAUDE.md) enforce a few invariants. These apply whether you drive xfer through Claude Code or by hand:
- Workstation orchestrates, clusters execute. Run xfer from a local checkout in a
uvenvironment. SSH to Slurm login nodes formanifest buildandsbatch;analyze,shard,rebase, andrenderrun locally. - Paths are per-system.
--rclone-config, the xfer repo path, and the run directory all differ between workstation, build cluster, and transfer cluster. Always resolve the correct absolute path on whichever host the command runs on β do not assume a workstation path resolves identically on a cluster. - POSIX-first manifest build. If any Slurm cluster has a POSIX mount of the source bucket, build the manifest there against the POSIX path. Listing is latency-bound, and POSIX beats S3 by a wide margin.
- CPU-only, load-aware transfer. Prefer CPU-only partitions for both build and transfer. Pick the transfer cluster by current
sinfo/squeueload rather than by habit. - Vantage change β rebase. When the host that will run the transfer has a different view of source or destination than the host that built the manifest, run
xfer manifest rebaseand re-shard before render. Skipping this makes every array task fail identically. - Credential hygiene. Keep
rclone.confat mode0600on every host it lives on. Never commit it to the repo, and confirm before transmitting it overscp.
- Manifest is immutable β enables reproducibility and auditing
- Shards are deterministic β re-runs donβt reshuffle work
- rclone handles object-level idempotency
- Slurm handles node-level failures
- xfer handles orchestration only
- To enable pre-commit
blackformatting, runuv run pre-commit install- If necessary, you can format locally with
uv run black .
- If necessary, you can format locally with
- Name branches as either:
<your name>/<branch name>(e.g.,alice/update-readme)<type of contribution>/<branch name>(e.g.,feature/claude-integration) (these are usuallyfeature,patch, ordocs)
- Do NOT squash PRs into a single commit
