ScrapWave is a lightweight and powerful web scraping library built on top of got and cheerio. It allows you to fetch, parse, and extract data from web pages with ease.
npm install scrapwaveor using Yarn:
yarn add scrapwaveimport ScrapWave from "scrapwave";
(async () => {
const scrapper = await ScrapWave.connect("https://example.com");
console.log(scrapper.getTitle());
})();- Fetch and parse web pages with ease.
- Extract metadata, links, images, emails, and phone numbers.
- Scrape table data and structured JSON-LD content.
- Supports POST requests with form data.
- Configurable request options (timeout, retries, headers, etc.).
- Download images automatically.
Fetches the HTML content of the given URL and returns a ScrapWave instance.
Sends a POST request with form data and returns a ScrapWave instance.
Returns the title of the page.
Extracts metadata (title, description, author, Open Graph & Twitter metadata).
Extracts all links from the given selector.
Extracts image URLs from the page.
Finds and returns email addresses from the page content.
Finds and returns phone numbers from the page content.
Extracts JSON-LD structured data from the page.
Finds and extracts form details (action, method, input fields).
Extracts a list of text content from elements like <li>.
Extracts the outer HTML of the given selector.
Extracts table data from the given selector.
Extracts the text content of the given selector.
Extracts the inner HTML of the given selector.
Retrieves the value of a specific attribute from the given selector.
Checks whether a specific selector exists on the page.
Counts the number of elements that match the given selector.
Downloads all images from the page into the specified folder.
Allows customizing request settings such as timeout, retries, headers, etc.
ScrapWave.setRequestOptions({
timeout: { request: 4000 }, // Set timeout to 4s
retry: { limit: 3 }, // Allow up to 3 retries
});const scrapper = await ScrapWave.connect("https://example.com");
console.log(scrapper.getLinks("a"));const scrapper = await ScrapWave.connect("https://example.com");
console.log(scrapper.tableData("table"));const scrapper = await ScrapWave.connect("https://example.com");
await scrapper.downloadImages("downloads");const scrapper = await ScrapWave.connect("https://example.com");
console.log("Emails:", scrapper.extractEmails());
console.log("Phones:", scrapper.extractPhones());const scrapper = await ScrapWave.connect("https://example.com");
console.log("Text:", scrapper.text("p"));
console.log("HTML:", scrapper.html("div"));
console.log("Outer HTML:", scrapper.outerHtml("div"));const scrapper = await ScrapWave.connect("https://example.com");
console.log(scrapper.getMetadata());const scrapper = await ScrapWave.connect("https://example.com");
console.log(scrapper.getTextList("ul"));const scrapper = await ScrapWave.connect("https://example.com");
console.log("Exists:", scrapper.exists("h1"));
console.log("Count:", scrapper.count("li"));Contributions are welcome! Feel free to open an issue or submit a pull request.
MIT License. Feel free to use and modify as needed.