Skip to content

cto-mindset/scrapwave

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ScrapWave - Lightweight Web Scraping Library

ScrapWave is a lightweight and powerful web scraping library built on top of got and cheerio. It allows you to fetch, parse, and extract data from web pages with ease.

CodeQL Advanced Run Scrapper Tests npm npm GitHub

πŸš€ Installation

npm install scrapwave

or using Yarn:

yarn add scrapwave

πŸ“– Quick Start

import ScrapWave from "scrapwave";

(async () => {
  const scrapper = await ScrapWave.connect("https://example.com");
  console.log(scrapper.getTitle());
})();

🎯 Features

  • Fetch and parse web pages with ease.
  • Extract metadata, links, images, emails, and phone numbers.
  • Scrape table data and structured JSON-LD content.
  • Supports POST requests with form data.
  • Configurable request options (timeout, retries, headers, etc.).
  • Download images automatically.

πŸ“Œ API Methods

ScrapWave.connect(url: string): Promise<ScrapWave>

Fetches the HTML content of the given URL and returns a ScrapWave instance.

ScrapWave.post(url: string, data: Record<string, string>): Promise<ScrapWave>

Sends a POST request with form data and returns a ScrapWave instance.

scrapper.getTitle(): string

Returns the title of the page.

scrapper.getMetadata(): Metadata

Extracts metadata (title, description, author, Open Graph & Twitter metadata).

scrapper.getLinks(selector: string): string[]

Extracts all links from the given selector.

scrapper.imageSources(selector = "img"): string[]

Extracts image URLs from the page.

scrapper.extractEmails(): string[]

Finds and returns email addresses from the page content.

scrapper.extractPhones(): string[]

Finds and returns phone numbers from the page content.

scrapper.getJsonLD(): JsonLD

Extracts JSON-LD structured data from the page.

scrapper.getForms(): FormDetails[]

Finds and extracts form details (action, method, input fields).

scrapper.getTextList(selector: string): string[]

Extracts a list of text content from elements like <li>.

scrapper.outerHtml(selector: string): string

Extracts the outer HTML of the given selector.

scrapper.tableData(selector: string): TableData

Extracts table data from the given selector.

scrapper.text(selector: string): string

Extracts the text content of the given selector.

scrapper.html(selector: string): string

Extracts the inner HTML of the given selector.

scrapper.attr(selector: string, attribute: string): string | undefined

Retrieves the value of a specific attribute from the given selector.

scrapper.exists(selector: string): boolean

Checks whether a specific selector exists on the page.

scrapper.count(selector: string): number

Counts the number of elements that match the given selector.

scrapper.downloadImages(folder = "images"): Promise<void>

Downloads all images from the page into the specified folder.

scrapper.setRequestOptions(options: RequestOptions): void

Allows customizing request settings such as timeout, retries, headers, etc.

βš™οΈ Customizing Request Options

ScrapWave.setRequestOptions({
  timeout: { request: 4000 }, // Set timeout to 4s
  retry: { limit: 3 }, // Allow up to 3 retries
});

πŸ› οΈ Usage Examples

Extracting Links

const scrapper = await ScrapWave.connect("https://example.com");
console.log(scrapper.getLinks("a"));

Scraping Table Data

const scrapper = await ScrapWave.connect("https://example.com");
console.log(scrapper.tableData("table"));

Downloading Images

const scrapper = await ScrapWave.connect("https://example.com");
await scrapper.downloadImages("downloads");

Extracting Emails & Phone Numbers

const scrapper = await ScrapWave.connect("https://example.com");
console.log("Emails:", scrapper.extractEmails());
console.log("Phones:", scrapper.extractPhones());

Extracting Text & HTML

const scrapper = await ScrapWave.connect("https://example.com");
console.log("Text:", scrapper.text("p"));
console.log("HTML:", scrapper.html("div"));
console.log("Outer HTML:", scrapper.outerHtml("div"));

Extracting Metadata

const scrapper = await ScrapWave.connect("https://example.com");
console.log(scrapper.getMetadata());

Extracting List Items

const scrapper = await ScrapWave.connect("https://example.com");
console.log(scrapper.getTextList("ul"));

Checking for Element Existence & Counting Elements

const scrapper = await ScrapWave.connect("https://example.com");
console.log("Exists:", scrapper.exists("h1"));
console.log("Count:", scrapper.count("li"));

πŸ“Œ Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request.

πŸ“œ License

MIT License. Feel free to use and modify as needed.

About

ScrapWave - A fast and powerful TypeScript web scraper! πŸš€

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors