A Python-based version control system for large assets using Amazon S3 and S3-compatible storage. This system is designed to work like Git LFS but utilizes S3 for better bandwidth and scalability. It supports file tracking, parallel operations, encryption, and any S3-compatible backend (MinIO, Cloudflare R2, Backblaze B2, Wasabi, DigitalOcean Spaces, etc.).
- Upload and track large files in S3 instead of Git
- Works with any S3-compatible storage (MinIO, Cloudflare R2, Backblaze B2, Wasabi)
- Block-level parallel transfers: Downloads and uploads flatten all chunks across all files into a single worker pool
- Automatic parallel compression: Uses pigz when available, falls back to gzip
- Git hook integration:
s3lfs installsets up post-checkout, post-merge, and pre-push hooks - Git LFS migration: One-command migration with
s3lfs migrate-from-lfs - GitHub Action: Built-in CI/CD support with selective checkout
- Per-repo config:
.s3lfsconfigfile for team-wide defaults - SHA-256 content-based file deduplication
- AES256 server-side encryption
- Configurable worker count (auto-detected from CPU count)
- Exponential backoff retries for transient S3 errors
pip install s3lfspip install uv
uv syncThe CLI tool provides a simplified set of commands for managing large files with S3. All commands automatically use the bucket and prefix configured during initialization.
Subdirectory Support: All s3lfs commands work from any subdirectory within the git repository. The tool automatically discovers the git repository root and resolves paths relative to it. For example, running s3lfs track file.txt from the data/ directory will track data/file.txt.
s3lfs init <bucket-name> <repo-prefix>Description: Initializes the S3LFS system with the specified S3 bucket and repository prefix. This creates a .s3_manifest.yaml file that stores the configuration and file mappings.
Example:
s3lfs init my-bucket my-projects3lfs track <path>
s3lfs track --modifiedDescription: Tracks and uploads files, directories, or glob patterns to S3.
Options:
--modified: Track only files that have changed since last upload--verbose: Show detailed progress information--no-sign-request: Use unsigned S3 requests (for public buckets)--workers N: Number of parallel workers (default: auto-detected from CPU count)--metrics: Enable parallelism metrics collection
Examples:
s3lfs track data/large_file.zip # Track a single file
s3lfs track data/ # Track entire directory
s3lfs track "*.mp4" # Track all MP4 files
s3lfs track --modified # Track only changed filess3lfs checkout <path>
s3lfs checkout --allDescription: Downloads files, directories, or glob patterns from S3.
Options:
--all: Download all files tracked in the manifest--verbose: Show detailed progress information--no-sign-request: Use unsigned S3 requests (for public buckets)--workers N: Number of parallel workers (default: auto-detected from CPU count)--metrics: Enable parallelism metrics collection
Examples:
s3lfs checkout data/large_file.zip # Download a single file
s3lfs checkout data/ # Download entire directory
s3lfs checkout "*.mp4" # Download all MP4 files
s3lfs checkout --all # Download all tracked filess3lfs ls [<path>]
s3lfs ls --allDescription: Lists files tracked by s3lfs. If no path is provided, all tracked files are listed by default. Supports files, directories, and glob patterns.
Options:
--all: List all tracked files (default if no path is provided)--verbose: Show detailed information including file sizes and hashes--no-sign-request: Use unsigned S3 requests (for public buckets)
Examples:
s3lfs ls # List all tracked files
s3lfs ls data/ # List files in the data directory
s3lfs ls "*.mp4" # List all MP4 files
s3lfs ls --all --verbose # List all files with detailed infoPipe-friendly Output: In non-verbose mode, the ls command outputs one file path per line without headers or formatting, making it easy to pipe into other commands. Paths are shown relative to your current directory:
s3lfs ls | grep "\.mp4" # Filter for MP4 files in current directory
s3lfs ls | wc -l # Count tracked files in current directory
s3lfs ls data/ | xargs -I {} echo "Processing {}" # Process each file in data/s3lfs remove <path>Description: Removes files or directories from tracking. Supports files, directories, and glob patterns.
Options:
--purge-from-s3: Immediately delete files from S3 (default: keep for history)--no-sign-request: Use unsigned S3 requests
Examples:
s3lfs remove data/old_file.zip # Remove single file
s3lfs remove data/temp/ # Remove directory
s3lfs remove "*.tmp" # Remove all temp files
s3lfs remove data/ --purge-from-s3 # Remove and delete from S3
⚠️ Work in Progress: The cleanup command is experimental and untested. Use with caution.
s3lfs cleanupDescription: Removes files from S3 that are no longer referenced in the current manifest.
Options:
--force: Skip confirmation prompt--no-sign-request: Use unsigned S3 requests
Example:
s3lfs cleanup --force # Clean up without confirmations3lfs installDescription: Installs git hooks for transparent s3lfs integration. After installation, tracked files are automatically downloaded after git checkout and git merge, and modified files are automatically uploaded before git push.
Installed hooks:
post-checkout: Downloads tracked files after branch checkoutspost-merge: Downloads tracked files after mergespre-push: Uploads modified tracked files before push
The hooks are non-blocking -- if s3lfs fails or is not available, the git operation continues with a warning. Hooks are appended to existing hook files, preserving any other hooks you have.
s3lfs uninstallDescription: Removes s3lfs git hooks. Other hooks in the same files are preserved.
s3lfs migrate-from-lfs <bucket-name> <repo-prefix>Description: Converts a Git LFS repository to s3lfs in one step. Detects LFS-tracked patterns from .gitattributes, verifies files contain real content (not pointer files), initializes s3lfs, and uploads all files to S3.
Options:
--dry-run: Preview what would be migrated without making changes--remove-lfs/--keep-lfs: Remove LFS entries from.gitattributesafter migration (default: keep)--no-sign-request: Use unsigned S3 requests--use-acceleration: Enable S3 Transfer Acceleration
Examples:
# Preview migration
s3lfs migrate-from-lfs my-bucket my-project --dry-run
# Migrate and keep LFS entries (safe, reversible)
s3lfs migrate-from-lfs my-bucket my-project
# Migrate and remove LFS tracking
s3lfs migrate-from-lfs my-bucket my-project --remove-lfsPrerequisites: Run git lfs pull first to ensure all LFS files contain actual content (not pointer files). The command will error if any pointer files are detected.
First, initialize S3LFS in your repository:
s3lfs init my-bucket my-project-nameThis creates .s3_manifest.yaml which should be committed to Git, and automatically updates your .gitignore to exclude S3LFS cache files:
git add .s3_manifest.yaml .gitignore
git commit -m "Initialize S3LFS"For a Git LFS-like experience where files sync automatically:
s3lfs installWith hooks installed, git pull and git checkout automatically download tracked files, and git push automatically uploads modified files.
Instead of committing large files directly to Git, track them with S3LFS:
s3lfs track data/large_dataset.zip
s3lfs track models/
s3lfs track "*.mp4"After tracking files, commit the updated manifest:
git add .s3_manifest.yaml
git commit -m "Track large files with S3LFS"
git pushWhen cloning the repository, restore tracked files:
git clone https://github.com/your-repo/my-repo.git
cd my-repo
s3lfs checkout --allFor ongoing development:
# Track any modified large files
s3lfs track --modified
# Commit manifest changes
git add .s3_manifest.yaml
git commit -m "Update tracked files"
# Download latest files
s3lfs checkout --allDownload only specific files or directories:
s3lfs checkout data/ # Only data directory
s3lfs checkout "models/*.pkl" # Only pickle files in modelsAll commands work from any subdirectory within the git repository:
cd data/
s3lfs track large_file.zip # Tracks data/large_file.zip
s3lfs ls # Lists all tracked files (shows full paths from git root)
s3lfs checkout large_file.zip # Downloads data/large_file.zip
cd ../models/
s3lfs track "*.pkl" # Tracks models/*.pkl files
s3lfs ls --verbose # Lists with detailed info (shows full paths)Note: The ls command shows paths relative to your current directory when run from a subdirectory. For example, if you're in the foo/ directory, s3lfs ls will show file1.mp4 instead of foo/file1.mp4. This provides a local view of tracked files. In non-verbose mode, the output is pipe-friendly with one file path per line.
Periodically clean up unreferenced files (use with caution - this feature is untested):
s3lfs cleanupUse the built-in GitHub Action to install s3lfs and checkout tracked files in your workflows:
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- uses: kmatzen/s3lfs@main
with:
checkout: allOnly download the files your pipeline needs — no wasted bandwidth:
- uses: kmatzen/s3lfs@main
with:
checkout: "assets/textures/**"| Input | Default | Description |
|---|---|---|
version |
latest |
s3lfs version to install |
checkout |
none |
all, a glob pattern, or none (install only) |
no-sign-request |
false |
Use unsigned S3 requests (public buckets) |
use-acceleration |
false |
Enable S3 Transfer Acceleration |
See examples/ for complete workflow files.
For GitLab CI, Jenkins, or other systems, install s3lfs directly:
pip install s3lfs
s3lfs checkout --all # or a selective globEnsure your AWS credentials are configured:
aws configureOr use environment variables:
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_DEFAULT_REGION=us-east-1Use the --endpoint-url flag to connect to any S3-compatible storage provider:
# MinIO
s3lfs init my-bucket my-project --endpoint-url http://localhost:9000
# Cloudflare R2
s3lfs init my-bucket my-project --endpoint-url https://<account-id>.r2.cloudflarestorage.com
# Backblaze B2
s3lfs init my-bucket my-project --endpoint-url https://s3.us-west-004.backblazeb2.com
# Wasabi
s3lfs init my-bucket my-project --endpoint-url https://s3.wasabisys.comThe endpoint URL is stored in the manifest, so subsequent commands pick it up automatically. You can override it per-command if needed.
Create a .s3lfsconfig file at the git root to set defaults for the whole team:
# .s3lfsconfig - commit this to version control
no_sign_request: true
use_acceleration: falseWhen .s3lfsconfig exists, its values are used as defaults for all commands. CLI flags still override config values - for example, s3lfs track --no-sign-request always uses unsigned requests regardless of the config.
Supported keys:
no_sign_request: Use unsigned S3 requests (default:false)use_acceleration: Enable S3 Transfer Acceleration (default:false)
For public S3 buckets, use the --no-sign-request flag or set it in .s3lfsconfig:
s3lfs init public-bucket my-project --no-sign-request
s3lfs checkout --all --no-sign-requestThe .s3_manifest.yaml file contains:
- S3 bucket and prefix configuration
- File-to-hash mappings for tracked files
- Should be committed to Git for team collaboration
Uploads and downloads use block-level parallelism: all chunks across all files are submitted to a single shared worker pool. This means a 20GB file split into 4 chunks downloads all 4 concurrently, alongside chunks from other files.
The worker count is auto-detected from your CPU count but can be overridden:
s3lfs track data/ --workers 32 # Use 32 parallel workers
s3lfs checkout --all --workers 16 # Limit to 16 workersThe default is min(32, cpu_count + 4). Increase for high-bandwidth connections with many small files; decrease for memory-constrained environments.
Files are automatically compressed with gzip before upload. When pigz is installed, s3lfs uses it for parallel compression across all CPU cores. The output format is identical to gzip, so existing tracked files work without changes.
To install pigz: apt install pigz (Debian/Ubuntu), brew install pigz (macOS).
Use the --metrics flag to collect parallelism metrics during operations:
s3lfs track data/ --metrics
s3lfs checkout --all --metricsThis reports worker utilization, task durations, and stage-level parallelism for hashing, compression, upload, and download.
Transient S3 errors (network timeouts, throttling) are retried automatically with exponential backoff (2s, 4s, 8s, capped at 30s). Each operation retries up to 3 times before failing.
Files with identical content (same hash) are stored only once in S3, regardless of path or filename.
S3LFS supports both SHA-256 (default) and MD5 hashing:
- SHA-256: More secure, used for file integrity
- MD5: Available for compatibility with legacy systems
- AWS Credentials: Ensure credentials are properly configured
- Bucket Permissions: Verify read/write access to the S3 bucket
- Network: Check internet connectivity for S3 operations
- Disk Space: Ensure sufficient local storage for file operations
Use --verbose flag for detailed operation information:
s3lfs track data/ --verbose
s3lfs checkout --all --verboseMIT License
Pull requests are welcome! Please submit issues and suggestions via GitHub.
This project uses pre-commit hooks to ensure code quality. The hooks include:
- Code Quality: Trailing whitespace, end-of-file fixer, YAML validation, large file detection
- Python Formatting: Black code formatter with 88-character line length
- Import Sorting: isort with Black profile
- Linting: flake8 with extended ignore patterns
- Type Checking: mypy with boto3 type stubs
- Unit Tests: Automatic test execution on every commit
To set up pre-commit hooks:
# Install pre-commit
pip install pre-commit
# Install the git hook scripts
pre-commit install
# Run all hooks on all files
pre-commit run --all-filesThe test hook will automatically run all unit tests before each commit, ensuring that code changes don't break existing functionality.