Home

Wayback Machine Downloader Wiki

Wayback Machine Downloader is a Python CLI for downloading archived websites from the Internet Archive Wayback Machine and rewriting them for local browsing.

It is designed for:

digital preservation
recovering or mirroring defunct sites
offline browsing of archived captures
historical web analysis and OSINT workflows
repeatable, resumable archive downloads

What It Does

downloads the latest capture of each logical file by default
can keep every archived timestamp for a target
can build a best-effort composite snapshot as of a chosen date
rewrites downloaded HTML, CSS, JS, and archived absolute links for local use
resumes interrupted runs using local state files
discovers linked page assets after HTML downloads
can recursively mirror subdomains into a local subdomains/ tree
ships with GitHub Actions for CI, build, TestPyPI, and PyPI publishing

Start Here

Typical Commands

Download the latest capture of every logical file for a site:

python -m wayback_downloader https://example.com

Preview the planned captures without downloading:

python -m wayback_downloader --list https://example.com

Download all timestamps instead of only the newest capture:

python -m wayback_downloader --all-timestamps https://example.com

Build a site as it looked around a given time:

python -m wayback_downloader --snapshot-at 20130101000000 https://example.com

Rewrite an existing download tree for offline browsing:

python -m wayback_downloader --local-only ./websites/example.com

Core Concepts

Logical file ID

The downloader maps archived URLs to stable logical file IDs. Those IDs drive:

the local output path
resume tracking in .downloaded.txt
duplicate detection
local link rewriting

Snapshot planning

The CDX API returns raw (timestamp, original_url) rows. The downloader turns them into planned Snapshot objects using the current filters and snapshot mode.

Resume state

Each output tree can keep:

.cdx.json for cached CDX results
.downloaded.txt for successful logical file IDs

Those files allow interrupted runs to continue without starting from scratch.

Packaging and Automation

The project publishes as the PyPI distribution:

wayback-machine-downloader

The import package remains:

import wayback_downloader

Release automation is documented in Automation and Release.

Main Repository | PyPI Package | Issue Tracker

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Wayback Machine Downloader Wiki

What It Does

Start Here

Typical Commands

Core Concepts

Logical file ID

Snapshot planning

Resume state

Packaging and Automation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Usage

Internals

Clone this wiki locally