Skip to content

webpeel/webpeel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

669 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

WebPeel — Web data API for AI agents

npm version PyPI version Monthly downloads GitHub stars License Status

The web data API for AI agents.
Fetch, search, extract, and understand any webpage — with one API call.

Docs · Dashboard · API Reference · Discord · Status


Get Started

Install

# Node.js / TypeScript
npm install webpeel

# Python
pip install webpeel

# No install — use directly
npx webpeel "https://example.com"

Usage

TypeScript

import { WebPeel } from 'webpeel';

const wp = new WebPeel({ apiKey: process.env.WEBPEEL_API_KEY });
const result = await wp.fetch('https://news.ycombinator.com');
console.log(result.markdown); // Clean, structured content

Python

from webpeel import WebPeel

wp = WebPeel(api_key=os.environ["WEBPEEL_API_KEY"])
result = wp.fetch("https://news.ycombinator.com")
print(result.markdown)  # Clean, structured content

curl

curl "https://api.webpeel.dev/v1/fetch?url=https://example.com" \
  -H "Authorization: Bearer $WEBPEEL_API_KEY"

Get your free API key → · No credit card required · 500 requests/week free


What It Does

Capability Result
🌐 Fetch Any URL → clean markdown or JSON. Handles JavaScript, bot detection, and dynamic content automatically
🔍 Search Web search with structured results — titles, URLs, snippets, and optional full-page content
📊 Extract Pull structured data using JSON Schema. Products, pricing, contacts, tables — any pattern
🕷️ Crawl Map and scrape entire websites with one API call. Follows links, respects robots.txt
🤖 MCP 18 tools natively available in Claude, Cursor, VS Code, Windsurf, and any MCP-compatible agent
📸 Screenshot Full-page or viewport screenshots in PNG/JPEG
🎬 YouTube Video transcripts with timestamps — no YouTube API key required
👁️ Monitor Watch pages for changes and receive webhook notifications

Anti-Bot Bypass Stack

WebPeel uses a 4-layer escalation chain to bypass bot protection — all built in-house, no paid proxy services required:

1. PeelTLS      — Chrome TLS fingerprint spoofing (in-process Go binary)  ~85% of sites
2. CF Worker    — Cloudflare edge network proxy (different IP reputation)  +5%
3. Google Cache — Cached page copy if available                            +2%
4. Search       — Extract from search engine snippets (last resort)        last resort

For e-commerce sites, WebPeel uses official APIs before attempting HTML scraping:

  • Best Buy — Free Products API (50K queries/day). Set BESTBUY_API_KEY env var.
  • Walmart — Frontend API (may be blocked; falls through gracefully)
  • Reddit, GitHub, HN, Wikipedia, YouTube, ArXiv — Official APIs, always fast

Self-hosted CF Worker (100K requests/day free):

cd worker && npx wrangler deploy
# Then set WEBPEEL_CF_WORKER_URL and WEBPEEL_CF_WORKER_TOKEN env vars

Benchmarks

Independent testing across 500 URLs including e-commerce, news, SaaS, and social platforms.

Metric WebPeel Firecrawl Crawl4AI Jina Reader
Success rate (protected sites) 94% 71% 58% 49%
Median response time 380ms 890ms 1,240ms 520ms
Content quality score¹ 0.91 0.74 0.69 0.72
Price per 1,000 requests $0.80 $5.33 self-host $1.00

¹ Content quality = signal-to-noise ratio (relevant content vs boilerplate), scored 0–1.

Methodology: Tested Feb 2026. Protected sites = Cloudflare/bot-protected pages. Quality scored by GPT-4o on content relevance and completeness. Full methodology →


Pricing

Plan Price Requests Features
Free $0/mo 500/week Fetch, search, extract, crawl
Pro $9/mo 1,250/week Everything + protected site access
Max $29/mo 6,250/week Everything + priority queue
Enterprise Custom Unlimited SLA, dedicated infra, custom domains

All plans include: full API access, TypeScript + Python SDKs, MCP server, CLI.
See full pricing →


SDK

TypeScript / Node.js

import { WebPeel } from 'webpeel';

const wp = new WebPeel({ apiKey: process.env.WEBPEEL_API_KEY });

// Fetch a page
const page = await wp.fetch('https://stripe.com/pricing', {
  format: 'markdown',  // 'markdown' | 'html' | 'text' | 'json'
});

// Search the web
const results = await wp.search('best vector databases 2025', {
  limit: 5,
  fetchContent: true,  // Optionally fetch full content for each result
});

// Extract structured data
const pricing = await wp.extract('https://stripe.com/pricing', {
  schema: {
    type: 'object',
    properties: {
      plans: {
        type: 'array',
        items: { type: 'object', properties: {
          name: { type: 'string' },
          price: { type: 'string' },
          features: { type: 'array', items: { type: 'string' } }
        }}
      }
    }
  }
});

// Crawl a site
const crawl = await wp.crawl('https://docs.example.com', {
  maxPages: 50,
  maxDepth: 3,
  outputFormat: 'markdown',
});
for await (const page of crawl) {
  console.log(page.url, page.markdown);
}

// Screenshot
const shot = await wp.screenshot('https://webpeel.dev', { fullPage: true });
fs.writeFileSync('screenshot.png', shot.image, 'base64');

Full TypeScript reference →

Python

from webpeel import WebPeel
import os

wp = WebPeel(api_key=os.environ["WEBPEEL_API_KEY"])

# Fetch a page
page = wp.fetch("https://stripe.com/pricing", format="markdown")
print(page.markdown)

# Search
results = wp.search("best vector databases 2025", limit=5)
for r in results:
    print(r.title, r.url)

# Extract structured data
pricing = wp.extract("https://stripe.com/pricing", schema={
    "type": "object",
    "properties": {
        "plans": {
            "type": "array",
            "items": { "type": "object", "properties": {
                "name": { "type": "string" },
                "price": { "type": "string" }
            }}
        }
    }
})

# Async client
from webpeel import AsyncWebPeel
import asyncio

async def main():
    wp = AsyncWebPeel(api_key=os.environ["WEBPEEL_API_KEY"])
    results = await asyncio.gather(
        wp.fetch("https://site1.com"),
        wp.fetch("https://site2.com"),
        wp.fetch("https://site3.com"),
    )

asyncio.run(main())

Full Python reference →

MCP — For AI Agents

Give Claude, Cursor, or any MCP-compatible agent the ability to browse the web.

Claude Desktop (~/.claude/claude_desktop_config.json):

{
  "mcpServers": {
    "webpeel": {
      "command": "npx",
      "args": ["-y", "webpeel", "mcp"],
      "env": {
        "WEBPEEL_API_KEY": "wp_your_key_here"
      }
    }
  }
}

Cursor / VS Code (.cursor/mcp.json or .vscode/mcp.json):

{
  "mcpServers": {
    "webpeel": {
      "command": "npx",
      "args": ["-y", "webpeel", "mcp"],
      "env": {
        "WEBPEEL_API_KEY": "wp_your_key_here"
      }
    }
  }
}

Available MCP tools: fetch, search, extract, crawl, screenshot, youtube_transcript, monitor_start, monitor_stop, monitor_list, batch_fetch, map_site, diff, summarize, qa, pdf, reddit, twitter, github — 18 tools total.

Install in Claude Desktop Install in VS Code

MCP setup guide →

CLI

# Install globally
npm install -g webpeel

# Fetch a page (outputs clean markdown)
webpeel "https://news.ycombinator.com"

# Search the web
webpeel search "typescript orm comparison 2025"

# Extract structured data
webpeel extract "https://stripe.com/pricing" --schema pricing-schema.json

# Crawl a site, save to folder
webpeel crawl "https://docs.example.com" --output ./docs-dump --max-pages 100

# Screenshot
webpeel screenshot "https://webpeel.dev" --full-page --output screenshot.png

# YouTube transcript
webpeel youtube "https://youtube.com/watch?v=dQw4w9WgXcQ"

# Ask a question about a page
webpeel qa "https://openai.com/pricing" --question "How much does GPT-4o cost per million tokens?"

# Output as JSON
webpeel "https://example.com" --json

API Reference

Base URL: https://api.webpeel.dev/v1

# Fetch
GET /fetch?url=<url>&format=markdown

# Search
GET /search?q=<query>&limit=10

# Extract
POST /extract
{ "url": "...", "schema": { ... } }

# Crawl
POST /crawl
{ "url": "...", "maxPages": 50, "maxDepth": 3 }

# Screenshot
GET /screenshot?url=<url>&fullPage=true

# YouTube transcript
GET /youtube?url=<youtube_url>

All endpoints require Authorization: Bearer wp_YOUR_KEY.

Full API reference →


Links


Get started free →

About

Web fetcher for AI agents. Smart escalation from HTTP to headless browser. MCP server included.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors