Skip to content
Arie Joe edited this page Jun 3, 2026 · 1 revision

FAQ

What is the package name?

PyPI distribution:

wayback-machine-downloader

Import package:

wayback_downloader

CLI command:

wayback-machine-downloader

Where are files written by default?

Under:

./websites/<backup-name>/

Why do state files disappear after a successful run?

That is the default cleanup behavior. Use --keep if you want to preserve .cdx.json and .downloaded.txt.

Can it download every archived version of a file?

Yes:

python -m wayback_downloader --all-timestamps https://example.com

Can it reconstruct a site at one specific date?

Yes:

python -m wayback_downloader --snapshot-at 20130101000000 https://example.com

This builds a best-effort composite snapshot from captures at or before that timestamp.

Does it require third-party runtime dependencies?

No runtime dependencies are currently declared.

Does the test suite hit web.archive.org?

No. The current unit suite is intentionally offline and uses fake transports.

Does --local download missing assets?

Not by itself. --local rewrites saved files. Pair it with --page-requisites to fetch additional page assets.

What is the difference between --recursive-subdomains and --cross-host?

--recursive-subdomains:

  • mirrors first-party subdomains of the base domain

--cross-host:

  • allows page-requisite asset discovery to queue arbitrary other hosts

Why are some filenames sanitized or hashed?

Because archived URLs can contain:

  • query strings
  • invalid Windows characters
  • repeated percent encoding
  • directory-like paths

The downloader normalizes them into stable local files that can be resumed and rewritten consistently.

Can I publish releases automatically?

Yes. The repository already includes GitHub Actions for:

  • CI
  • TestPyPI
  • PyPI release publishing

See Automation and Release.

Clone this wiki locally