⚠️ Getting400 Bad Request,Net::ReadTimeout,ECONNREFUSED, or SSL errors fromwayback_machine_downloader? You are not doing something wrong — the tool has documented reliability issues since 2024. Jump to working alternatives below or skip the debugging entirely →
- Why the hartator tool stopped working
- All methods compared — honestly rated
- DIY guide — real commands for each method
- Why DIY output always looks broken
- When DIY makes sense — and when it doesn't
- What professional restoration includes
- FAQ
If you've searched how to download a website from the Wayback Machine, you found hartator/wayback-machine-downloader — 5,800+ GitHub stars, referenced in every tutorial, YouTube video, and forum thread.
There is one problem: it doesn't work reliably anymore.
It solved a real problem: scrape the Wayback Machine CDX API to get all archived URLs for a domain, then download each one. From 2014–2022, it worked well enough. Every blog post and Stack Overflow answer pointed to it — so it accumulated stars and became the default recommendation.
The result: thousands of developers follow instructions that are years out of date, waste hours debugging unfixable errors, and give up — assuming they did something wrong. Most tutorials never mention the tool is broken.
| # | Reason | Detail |
|---|---|---|
| 01 | Internet Archive rate limiting | IA tightened CDX API rate limits; bulk requests now return 429/400 errors the gem never handles gracefully |
| 02 | Ruby 3.x SSL behavior changes | Net::HTTP in Ruby 3+ enforces stricter SSL cert verification, causing OpenSSL::SSL::SSLError across many environments |
| 03 | Maintenance abandoned | Open issues go unresolved for 2+ years, PRs unmerged, maintainer inactive. Last meaningful commit: 2021 |
| 04 | No archive footprint cleanup | Even when it downloads, every HTML file contains Wayback Machine toolbar scripts and rewritten links — needs manual removal before deploying anywhere |
Error 1 — Most Common (Windows / all platforms)
ECONNREFUSED / Net::ReadTimeout
IA's servers drop or throttle connections from the concurrent request volume the gem makes. Using --concurrency 1 helps somewhat but doesn't resolve the underlying 400 errors. No fix exists in the original gem.
Error 2 — CDX API Rejection (macOS Sonoma / any platform)
open_http': 400 Bad Request (OpenURI::HTTPError)
The Wayback Machine updated their CDX API parameters. The gem sends queries in outdated format and gets rejected — often silently downloading 0 files. Only community forks with patched API calls fix this.
Error 3 — Ruby Version (Linux / Ruby 3.2+)
OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0
Ruby 3.x enforces stricter SSL certificate verification. The gem was written for older Ruby behavior. Downgrading to Ruby 3.1 via rbenv is the usual workaround.
Every meaningful approach tested and scored. No affiliate links, no promotional ratings.
| What matters | hartator gem |
Community fork ⚡ | wget 🛠 | HTTrack 🖱 | Archivarix 🌐 | Wayback Revive ✅ |
|---|---|---|---|---|---|---|
| Works reliably in 2026 | With effort | Yes | Yes | Yes | ✅ Guaranteed | |
| Full site — all pages | Usually | With flags | Usually | 200-file free limit | ✅ Complete | |
| Images & media recovered | Partial | Partial | Partial | Partial | Partial | ✅ Maximum recovery |
| Clean HTML (no archive code) | ✗ | ✗ | ✗ | ✗ | Partial | ✅ Fully cleaned |
| WordPress CMS delivery | ✗ | ✗ | ✗ | ✗ | ✗ | ✅ +$80 upgrade |
| Sites with 100+ pages | Possible, slow | Possible, tedious | Possible, slow | Paid tier required | ✅ No size limit | |
| Technical skill required | Ruby + CLI | Ruby, rbenv/rvm | CLI + scripting | Low (GUI available) | None | None — handled for you |
| Cost | Free | Free | Free | Free | Free / paid | $30 HTML · $110 WP |
The original gem is unmaintained but ShiftaDeband's fork patches the CDX API format issues. Install from GitHub — not the gem registry.
# Step 1: Use Ruby 3.1 — NOT 3.2/3.3 (SSL issues on newer versions)
# Install rbenv first if needed: https://github.com/rbenv/rbenv
rbenv install 3.1.0 && rbenv global 3.1.0
gem install bundler
# Step 2: Clone the maintained fork — NOT the original gem
git clone https://github.com/ShiftaDeband/wayback-machine-downloader.git
cd wayback-machine-downloader
bundle install
# Step 3: Basic download
bundle exec ruby bin/wayback_machine_downloader http://example.comUseful flags — reduces errors significantly:
# Reduce 429 rate-limiting errors with concurrency 1
bundle exec ruby bin/wayback_machine_downloader http://example.com --concurrency 1
# Target a specific snapshot date range
bundle exec ruby bin/wayback_machine_downloader http://example.com \
--from 20220101 \
--to 20231231 \
--concurrency 1
# Download only HTML (faster — skip images on first pass)
bundle exec ruby bin/wayback_machine_downloader http://example.com --only "*.html"
# Specify output directory
bundle exec ruby bin/wayback_machine_downloader http://example.com --directory ./output/
⚠️ Output still needs cleanup. Even when the fork downloads successfully, every HTML file contains the Wayback Machine toolbar script, rewritten internal links pointing to archive.org, and injected meta tags. You will need cleanup scripts before deploying.
💡 Still seeing SSL errors? Confirm you're on Ruby 3.1 (
ruby -v). If you're getting 0 URLs found, narrow the date range with--fromand--to— very large ranges sometimes return empty from the CDX API.
wget works on every Unix-like system with no dependency setup. You mirror pages directly from web.archive.org.
# Step 1: Find your snapshot date at web.archive.org — copy the timestamp
# Then mirror with these flags:
wget \
--recursive \
--level=5 \
--page-requisites \
--convert-links \
--no-parent \
--wait=1 \
--random-wait \
--restrict-file-names=windows \
--domains web.archive.org \
"https://web.archive.org/web/20231201000000*/https://example.com/"
# --wait=1 --random-wait → rate limiting protection
# --level=5 → follow links 5 levels deep
# --page-requisites → get all images, CSS, JS for each page
# Replace the date and domain with your actual snapshotCleanup required after downloading:
-
Remove the Wayback Machine toolbar script — Every HTML file has a multi-line
<!-- BEGIN WAYBACK TOOLBAR INSERT -->block. Must be removed from every page. A Python or sed script can batch this, but the block spans multiple lines making simple regex fragile. -
Fix all internal links — Every link is rewritten to a full archive.org path (e.g.
https://web.archive.org/web/20231201/https://example.com/about). These must be converted back to your-domain paths. Even--convert-linksproduces archive.org-relative paths, not your actual domain paths. -
Remove injected meta tags and attributes — Each file has
X-Archive-Orig-*attributes, archive-specific meta tags, and WM script attributes on HTML elements. These tell Google the page is an archive copy — harmful to SEO if left in. -
Reorganize the file structure — Files download into a deeply nested
web.archive.org/web/TIMESTAMP/example.com/directory. You need to flatten and rename to match your original URL structure before uploading anywhere.
⏱️ Time estimate: The wget download itself is fast. Cleanup for a 20–30 page site is 3–6 hours for a developer comfortable with scripting. For 100+ pages, plan a full day or more.
HTTrack is a mature website copier with both a GUI (Windows) and CLI (Mac/Linux). The most accessible free option for users not comfortable with terminals.
# Install
# macOS: brew install httrack
# Linux: sudo apt install httrack
# Windows: GUI installer from httrack.com (WinHTTrack)
# CLI: mirror a specific Wayback Machine snapshot
httrack "https://web.archive.org/web/20231215000000/https://example.com/" \
-O "/output/folder" \
"+*web.archive.org/web*example.com*" \
--near --mirror
# The scan rule (+*example.com*) stops it following unrelated archive.org links
# Replace the timestamp and domain with your actual targetUsing WinHTTrack GUI (Windows):
- Download and install WinHTTrack from httrack.com — free installer. Create a new project, set your output folder.
- Enter your Wayback Machine snapshot URL:
https://web.archive.org/web/20231215000000/https://yoursite.com/ - Add scan rule
+*yoursite.com*to prevent HTTrack following links out to unrelated archive.org pages. - Run and clean the output — same cleanup steps as wget apply.
Even when your download completes successfully, the output will not be an uploadable working website. Here's exactly why:
| Problem | What it means |
|---|---|
| 🔗 Archive.org codes in every file | Every HTML file contains the Wayback Machine toolbar script and wrapped banner HTML. Upload without removing it and visitors see archive.org banners on your live site. |
| 🔀 Broken internal links | Every link points to web.archive.org/web/TIMESTAMP/site.com instead of your domain. Navigation, images, CSS — all broken. |
| 🖼️ Missing or wrong images | Images are often served from different snapshot timestamps than the HTML. Tools frequently fail to match images to the correct version. |
| 📄 Incomplete page capture | The archive doesn't capture every page on every visit. Large sites may have 40–80% of pages archived. Category archives, individual posts, and deeper pages are often missing. |
| ⏱️ Massive time investment | Cleaning archive codes, fixing links, and verifying every page on a 30-page site takes 4–8 hours for someone who knows what they're doing. For non-technical users it's effectively impossible to complete correctly. |
| 📊 No SEO metadata recovery | DIY tools don't restore original meta titles, descriptions, or canonical URL structure in a usable form. |
- Your site has fewer than 10–15 pages
- You're comfortable writing cleanup scripts
- You just need the content — not a live deployable site
- You want to verify your site is archived before spending money
- Your site has more than 15 pages
- You need a working, uploadable site — not raw files
- You need original URL slugs preserved for SEO recovery
- You've already spent over an hour on this
- You need WordPress delivery
🔍 Not sure if your site was even archived? Use the free archive checker — enter your domain, see snapshot count and restore quality in 30 seconds. Free, no signup.
Wayback Revive — done-for-you restoration. No downloads, no cleanup scripts, no Ruby version debugging.
- ✅ All pages downloaded from the best available snapshot
- ✅ Every archive.org toolbar script and injected banner removed
- ✅ All internal links fixed — your domain, not archive.org paths
- ✅ Images recovered and correctly linked inside each page
- ✅ Original URL slugs preserved — existing backlinks still work
- ✅ Original meta titles and descriptions kept intact
- ✅ Works from any original platform — WP, HTML, Joomla, anything
- ✅ Detailed recovery report — every page accounted for
- ✅ Optional: delivered as a working WordPress CMS (+$80)
| Service | Price | Delivery | What you get |
|---|---|---|---|
| HTML Restoration | $30 | 1–2 days | Clean, deploy-ready HTML files |
| WordPress Restoration | $110 | 3–5 days | Fully working WordPress CMS |
If we can't restore your site from the archive, you get a full refund. No questions asked. You only pay when we deliver.
→ Order professional restoration
→ Check if your site is in the archive (free)
Is the hartator gem completely dead?
Not completely — it works for some people in some environments. But it fails often enough, and the maintainer is no longer responding to issues or merging PRs, that it's not a reliable starting point. Community forks (particularly ShiftaDeband's) have patched the most common API errors and are a better choice for CLI-based downloading in 2026.
What's the best free method in 2026?
For developers comfortable with terminal: the ShiftaDeband community fork with --concurrency 1 and a specific date range using --from and --to flags.
For non-technical users: HTTrack's Windows GUI (WinHTTrack) pointed at a timestamped Wayback Machine snapshot URL.
Both methods produce output that still requires archive footprint cleanup before you can deploy the site anywhere.
What are "archive footprints" and why do they matter?
When the Wayback Machine serves a page, it injects code into every response: a toolbar script, banner HTML, rewritten internal links pointing to archive.org timestamp URLs, and meta tags flagging the page as an archived snapshot.
If you deploy these files directly on your domain, Google sees an archive mirror — not a real website — and may refuse to index it correctly. Visitors also see the archive.org toolbar on every page. All of this must be stripped before deployment. Free tools download raw files; cleanup is entirely your responsibility.
How is professional restoration different from the free tools?
Free tools give you raw archive files — with archive code intact, links pointing to archive.org, and inconsistent image recovery.
Professional restoration means every page is fully cleaned (all archive code stripped), every internal link is corrected to your domain, images are recovered and correctly referenced, original URL slugs are preserved for SEO, and the output is verified before delivery.
The HTML service ($30) produces a clean, deploy-ready folder. The WordPress service ($110) produces a fully working CMS installation.
My site was originally plain HTML, not WordPress — can you still restore it?
Yes. The original platform doesn't matter. We restore any archived website — plain HTML, WordPress, Joomla, Drupal, or anything else. The HTML service ($30) restores the site as clean static files. The WordPress service ($110) migrates all content into a WordPress installation with Classic Editor.
What if only part of my site was archived?
We recover everything the Wayback Machine captured. Pages and assets not in the archive cannot be recovered — that's a limitation of the archive itself, not the restoration method. Your delivery report documents every recovered page alongside what was missing.
If we cannot find your site in the archive at all after your order, you receive a full refund.
Can I try the DIY methods first and then order if they don't work?
Absolutely — that's exactly what we'd recommend. This guide is here to give you the best possible chance of success with DIY. If you hit errors you can't resolve, or you need clean deployable output you can't produce from raw files, we're ready.
Use the free archive checker first to confirm your site is in the archive before placing an order.
Done debugging. Let us handle it.
| 💻 HTML Restoration — $30 | Clean, deploy-ready files. Delivered in 1–2 days. |
| 📦 WordPress Restoration — $110 | Fully working WP site. Delivered in 3–5 days. |
| 🛡️ 100% Money-Back Guarantee | If we can't find your site in the archive, full refund. |
→ Order Professional Restoration → Check If My Site Is Archived (Free)
Maintained by Wayback Revive · 500+ sites restored · Updated April 2026