Skip to content

NeaByteLab/Website-Cloner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Website Cloner (Node.js)

A fast, asset-complete website cloner built with Node.js, Puppeteer, Cheerio, and Axios. Crawls a website, downloads all HTML, CSS, JS, fonts, images, and media to a local folder. Suitable for archiving or offline analysis.


✨ Features

  • πŸ“„ Asset-complete crawl: CSS, JS, images, fonts, videos, audio, etc.
  • πŸ”— Recursive link following within root domain
  • 🎨 CSS parsing: Handles url(), @import, and asset references
  • 🧹 Query and fragment stripping for clean local files
  • πŸ•·οΈ Uses Puppeteer (headless Chrome) for reliable page rendering
  • ♻️ Download retry mechanism for network resilience

πŸ“¦ Install

git clone https://github.com/NeaByteLab/Website-Cloner.git
cd Website-Cloner
npm install

▢️ Usage

node index.js <website_url> [output_folder]
  • <website_url>: Root URL to clone (e.g. https://example.com)
  • [output_folder]: (Optional) Output directory (default: ./output)

Example:

node index.js https://example.com ./my-archive

πŸ“ Output

All files are saved with original folder structure in the output folder. Querystrings/fragments are stripped from asset references for clean offline usage.

πŸ“ Notes

  • 🌐 Only follows links within the provided root domain.
  • βœ‰οΈ Ignores mailto links and anchor jumps.
  • 🎯 All CSS url() and @import asset links are also downloaded.
  • πŸ”„ Minimal error output, retries up to 3 times for assets/pages.

πŸ“œ License

MIT License Β© 2025 NeaByteLab

About

Fast, asset-complete website cloner for Node.js. Crawls and downloads all HTML, CSS, JS, images, and media for full offline backup.

Topics

Resources

License

Stars

Watchers

Forks

Contributors