Skip to content

flobo3/skill-scrapingbee

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ScrapingBee Fetch OSS

A Python script to fetch and render JavaScript-heavy or bot-protected websites using the ScrapingBee API. It extracts the main content and returns it as clean Markdown, making it perfect for LLM reading and data extraction.

Features

  • Bypasses bot protection (Cloudflare, DataDome, etc.)
  • Automatically renders JavaScript
  • Extracts relevant text content and title
  • Fixes common encoding issues
  • Supports custom proxies, wait times, and ad blocking

Requirements

  • Python 3.10+
  • uv (recommended) or pip for dependency management
  • A ScrapingBee API key

Installation

Clone the repository:

git clone https://github.com/yourusername/scrapingbee-fetch-oss.git
cd scrapingbee-fetch-oss

Set your ScrapingBee API key as an environment variable:

export SCRAPINGBEE_API_KEY="your_api_key_here"

Alternatively, you can create a .env file in the project directory:

SCRAPINGBEE_API_KEY=your_api_key_here

Usage

You can run the script directly using uv:

uv run fetch.py --url "https://example.com"

Optional Arguments

  • --no-render-js: Disable JavaScript rendering (faster, but might miss dynamic content).
  • --country-code <code>: Use a proxy from a specific country (e.g., us, uk, ru).
  • --wait <milliseconds>: Wait time in milliseconds before extracting content (default: 3000).
  • --no-block-ads: Disable ad blocking (ads are blocked by default to save bandwidth and speed up rendering).
  • --premium-proxy: Use a premium proxy (useful for highly protected websites).

Examples

Fetch a website using a US proxy and wait 5 seconds:

uv run fetch.py --url "https://example.com" --country-code us --wait 5000

Fetch a highly protected website using a premium proxy:

uv run fetch.py --url "https://example.com" --premium-proxy

Fetch a simple website without rendering JavaScript:

uv run fetch.py --url "https://example.com" --no-render-js

License

MIT

About

Fetch and render JavaScript-heavy or bot-protected websites using ScrapingBee API for nanobot

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages