A Python script to fetch and render JavaScript-heavy or bot-protected websites using the ScrapingBee API. It extracts the main content and returns it as clean Markdown, making it perfect for LLM reading and data extraction.
- Bypasses bot protection (Cloudflare, DataDome, etc.)
- Automatically renders JavaScript
- Extracts relevant text content and title
- Fixes common encoding issues
- Supports custom proxies, wait times, and ad blocking
- Python 3.10+
uv(recommended) orpipfor dependency management- A ScrapingBee API key
Clone the repository:
git clone https://github.com/yourusername/scrapingbee-fetch-oss.git
cd scrapingbee-fetch-ossSet your ScrapingBee API key as an environment variable:
export SCRAPINGBEE_API_KEY="your_api_key_here"Alternatively, you can create a .env file in the project directory:
SCRAPINGBEE_API_KEY=your_api_key_hereYou can run the script directly using uv:
uv run fetch.py --url "https://example.com"--no-render-js: Disable JavaScript rendering (faster, but might miss dynamic content).--country-code <code>: Use a proxy from a specific country (e.g.,us,uk,ru).--wait <milliseconds>: Wait time in milliseconds before extracting content (default: 3000).--no-block-ads: Disable ad blocking (ads are blocked by default to save bandwidth and speed up rendering).--premium-proxy: Use a premium proxy (useful for highly protected websites).
Fetch a website using a US proxy and wait 5 seconds:
uv run fetch.py --url "https://example.com" --country-code us --wait 5000Fetch a highly protected website using a premium proxy:
uv run fetch.py --url "https://example.com" --premium-proxyFetch a simple website without rendering JavaScript:
uv run fetch.py --url "https://example.com" --no-render-jsMIT