Playground: Try it online
Quickcrawl is a powerful Go-based web scraping service that brings intelligence to AI agents. Whether you're scraping a single page, crawling an entire website, or mapping out link structures β Quickcrawl handles the heavy lifting with a sophisticated multi-layered architecture combining HTTP fetching, browser automation, and LLM-powered structured extraction.
| Feature | Description |
|---|---|
| π Web Scraping | Convert any URL to Markdown, HTML, Plain Text, or Links |
| π Async Crawling | BFS website crawler with depth/page limits and rate limiting |
| πΊοΈ URL Mapping | Discover all URLs on a site instantly without scraping content |
| π§ JavaScript Rendering | Auto-detect SPAs and render via LightPanda or Chrome |
| π LLM Extraction | Send a JSON schema, get validated structured data back |
| π Web Search | SearXNG-powered search for AI agent integration |
| π€ MCP Server | Built-in stdio transport for seamless AI agent integration |
| π¦ Multi-format Output | Markdown, HTML, RawHTML, PlainText, Links, JSON |
86.4% scrape success rate across 1,000 diverse URLs from the Firecrawl Scrape Content Dataset v1
| Metric | Value |
|---|---|
| Success Rate | 86.4% (864/1000) |
| Avg Scrape Latency | 3,776.8ms |
| P50 Latency | 2,242.6ms |
| P90 Latency | 7,590.8ms |
| P95 Latency | 13,000.8ms |
| P99 Latency | 21,188.8ms |
| Metric | Value |
|---|---|
| Phrase Match Rate | 47.69% |
| Clean Content Rate | 90.39% |
| URLs with Ground Truth | 769 |
| URLs with Matched Phrases | 412 |
Build RAG pipelines with clean LLM-ready markdown, give AI agents real-time web access, monitor content changes, extract structured data, convert HTML to clean markdown, or archive web pages at scale.
Quickcrawl MCP server provides AI agents with web scraping capabilities:
| Tool | Description |
|---|---|
scrape |
Scrape a single URL |
crawl |
Start async website crawl |
check_crawl_status |
Check crawl job status |
cancel_crawl |
Cancel running crawl |
map |
Discover URLs on a site |
site_map |
Discover URLs without scraping content (sitemap-aware) |
search |
Search SearXNG |
Add Quickcrawl to your OpenCode configuration:
{
"mcp": {
"quickcrawl": {
"type": "local",
"command": ["npx", "-y", "@mabudalam/quickcrawl-mcp"],
"enabled": true
}
}
}Renderer selection in MCP (per-request override of the server default):
"renderMode": "browser"β force the configured Chrome browser (chromedp) for the request"renderMode": "http"β use the plain HTTP fetcher only"renderMode": "auto"β HTTP first, escalate to browser on anti-bot / SPA signals- Omit
renderModeto inherit the server-widerender_modefromquickcrawl.toml(default:auto) - When
[renderer.chrome].ws_urlis unset inquickcrawl.tomland the effective mode needs a browser, MCP auto-launches a local LightPanda and uses its CDP endpoint. The launched process is killed on MCP shutdown.
Example MCP tool arguments:
{
"url": "https://www.notion.so/",
"formats": ["markdown"],
"renderMode": "browser"
}QuickCrawl ships with SKILL.md files that teach AI coding agents how to use it. Install with one command:
npx skills add MabudAlam/QuickCrawlThis discovers both skills in skills/quickcrawl-cli/ and skills/quickcrawl-mcp/ and installs them to your agent (Claude Code, OpenCode, Cursor, etc.). After installation, the agent knows how to scrape, crawl, map, and search without any manual instructions.
The Quickcrawl CLI provides standalone command-line access to all features. No server or Python needed.
# Quick install (recommended)
curl -fsSL https://raw.githubusercontent.com/MabudAlam/quickcrawl/main/install.sh | sh
# Or from source
go install github.com/MabudAlam/quickcrawl/cli
# Download from GitHub releases
curl -L https://github.com/MabudAlam/QuickCrawl/releases/latest/download/quickcrawl_darwin_arm64.tar.gz | tar -xz
./quickcrawl --help# Scrape a single URL
quickcrawl scrape https://example.com
quickcrawl scrape https://example.com --formats html,markdown
# Crawl a website
quickcrawl crawl https://example.com --max-pages 10 --max-depth 3
# Discover URLs (without scraping content)
quickcrawl map https://example.com --max-depth 2
# Search SearXNG
quickcrawl search "golang web scraping"
quickcrawl search "python" --scrape --formats markdownSee cli/cmd/ for all subcommands:
scrape.goβ Scrape single URLscrawl.goβ Crawl websitesmap.goβ Discover URLssearch.goβ Web search
Python SDK for Quickcrawl β scrape, crawl, and map websites from Python code.
# From PyPI (coming soon)
pip install quickcrawl
# From GitHub
pip install git+https://github.com/MabudAlam/quickcrawl.git@python-sdk#subdirectory=python
# Or clone and install
git clone https://github.com/MabudAlam/quickcrawl
cd quickcrawl/python
pip install -e .Developing locally? See CONTRIBUTING.md for clone instructions,
quickcrawl.tomland.envconfiguration, override precedence, and build commands for all three binaries.
from quickcrawl import QuickCrawlClient
# CLI mode (zero config, auto-downloads binary)
with QuickCrawlClient() as client:
result = client.scrape("https://example.com")
print(result["markdown"])
# HTTP mode (connect to deployed server)
client = QuickCrawlClient(api_url="https://your-server.com", api_key="...")
result = client.scrape("https://example.com")See python/examples/:
01_scrape.pyβ Scrape a single URL02_crawl.pyβ Crawl entire website03_map.pyβ Discover URLs without scraping04_formats.pyβ Multiple output formats05_cloud.pyβ Connect to deployed server06_search.pyβ Web search with scrapingperplexity.pyβ Perplexity-style AI research agent with Google ADK
QuickCrawl ships with SKILL.md files for AI coding agents:
| Skill | For | Install |
|---|---|---|
skills/quickcrawl-mcp/SKILL.md |
Claude Code, OpenCode, Cursor (MCP) | Copy to agent skills dir |
skills/quickcrawl-cli/SKILL.md |
Shell-based agents | go install .../cli |
QuickCrawl MCP tools: scrape, crawl, check_crawl_status, map, search
graph TD
Client["π₯οΈ Client (HTTP / MCP)"]
Server["π Quickcrawl Server"]
Client --> Server
Server --> Router["π‘ Gin Router"]
Router --> Handlers["π§ API Handlers"]
Handlers --> Renderer["π¨ Renderer Layer"]
Renderer --> HTTPFetcher["π HTTP Fetcher"]
Renderer --> Browser["π Browser (CDP)"]
Browser --> LightPanda["πΌ LightPanda"]
Browser --> Chrome["π΅ Chrome DevTools"]
Browser --> Chrome["π Chrome"]
Renderer --> Extractor["π Extractor"]
Extractor --> Markdown["π Markdown"]
Extractor --> HTML["π HTML"]
Extractor --> PlainText["π Plain Text"]
Extractor --> Links["π Links"]
Handlers --> Crawler["π Crawler"]
Crawler --> Robots["π€ Robots.txt"]
Crawler --> Sitemap["πΊοΈ Sitemap"]
Crawler --> RateLimit["β‘ Rate Limiter"]
Handlers --> Search["π SearXNG Search"]
Search --> LLM["π§ LLM Extraction"]
LLM --> JSONSchema["π JSON Schema Output"]
Request
β
*core.Scraper (single render path; chromedp + shared HTTPFetcher)
β
βββ renderMode=http/auto (HTTP) β renderer.HTTPFetcher (plain HTTP GET, no JS)
β
βββ renderMode=browser β chromedp RemoteAllocator β persistent Chrome
(JS rendered, anti-bot stealth, SPA readiness poll)
Both cli and mcp entry points auto-launch a local LightPanda when no
Chrome WS URL is configured (HTTP server does not β it requires a user-supplied
WS URL and falls back to HTTP-only when none is configured).
When a client calls POST /v1/scrape, Quickcrawl executes the following pipeline:
HTTP POST /v1/scrape
β
βΌ
handlers.Scrape() [internal/api/handlers/handler.go:64]
β β’ Parse + JSON-decode into core.ScrapeRequest
β β’ Validate URL (http/https required, non-empty)
β β’ Default formats to ["markdown"] if empty
β β’ Optional robots.txt check (config: crawler.respect_robots_txt)
β
βΌ
core.Scraper.Scrape() [internal/core/scraper.go:69]
β β’ resolveRenderMode() β request override, defaults to inheriting server render_mode
β β’ resolveWaitMs() β request override, defaults to 0
β β’ resolveFormats() β stringβtypes.OutputFormat conversion
β
βΌ
core.Renderer.FetchOrchestrator() [internal/core/renderer.go:213]
β
βββ (renderMode=auto/http) ββ HTTP path βββββββββββββ
β β
β renderer.HTTPFetcher.Fetch()
β [internal/renderer/http.go]
β β’ HTTP GET with stealth headers
β β’ Returns FetchResult{HTML, StatusCode}
β
βββ (renderMode=browser) ββ CDP path ββββββββββββββββ
βΌ
core.Renderer.fetchWithCDPBrowser() [internal/core/renderer.go:334]
β β’ Acquire per-host concurrency slot
β β’ Create isolated browser context (chromedp.NewContext)
β β’ Apply page_timeout_ms to chromedp.Run
β β’ Action sequence:
β - enableNetworkTracking(networkBundle)
β - stealthInjectionAction() (when crawler.stealth.enabled)
β - navigateIgnoringHTTPStatus()
β - dismissCookieBannersFastAction() (when waitMs == 0)
β - WaitForSPAReady() (polls for content readiness,
β network-idle, or selector hit)
β - autoScrollAction() (when waitMs == 0 + lazy markers)
β - OuterHTML of <head> + <body>
β β’ Anti-bot challenge detection (status 4xx/5xx)
β β’ Returns FetchResult{HTML, FinalURL, StatusCode, ContentType}
β
βΌ
core.Extractor.Extract() [internal/core/extractor.go]
β β’ ExtractMetadata β title, description, OG tags, canonical, language
β β’ preprocessHTML β strip head, cleanNoise, applyNoisePatterns
β (and IncludeTags / ExcludeTags / CSSSelector if set)
β β’ postprocessHTML β sanitize, dedupe, normalize whitespace
β β’ HTMLToMarkdown β primary conversion (fullClean β structural β plaintext)
β β’ HTMLToPlaintext
β β’ ExtractLinks, ExtractImageURLs
β
βΌ Returns: core.ScrapeData{Markdown, HTML, PlainText, Links, ImageLinks, Metadata}
β
βΌ
(Optional) LLM Structured Extraction [internal/core/llm.go]
β β’ Triggered when formats contains "json" and [extraction.llm] is configured
β β’ buildLLMInput β callOpenAI(chat/completions) β validateDataAgainstSchema
β β’ Populates data.JSON
β
βΌ
handlers.Scrape()
β β’ If statusCode >= 400 and body < 200 chars β surface as failure
β β’ Else return success with data + warning
β
βΌ
c.JSON(http.StatusOK, APIResponse{ScrapeData})
HTTP POST /v1/crawl
β
βΌ
handlers.StartCrawl() [internal/api/handlers/handler.go:166]
β β’ Parse + JSON-decode into types.CrawlRequest
β β’ Validate URL, maxDepth (0-10), maxPages (1-1000)
β β’ Default maxDepth/maxPages from config
β β’ Reject formats=["json"] with 400 (use /v1/scrape for LLM extraction)
β β’ Generate job ID, store in AppState.CrawlJobs
β
βΌ
crawler.RunCrawl() [internal/crawler/crawl.go]
β β’ BFS from seed URL, respecting same-origin
β β’ robots.txt check per page (if enabled)
β β’ Per-host rate limiter (crawler.requests_per_second)
β β’ Per-host + global concurrency slots
β β’ For each page:
β - core.Scraper.Scrape() (same pipeline as above)
β - Stealth jitter added to inter-request sleep
β - Update CrawlState via stateCh
β
βΌ Returns: types.CrawlState{Total, Completed, Data[], Status}
β
βΌ
c.JSON(http.StatusOK, CrawlStartResponse{ID})
GET /v1/crawl/:id β handlers.GetCrawlStatus() β returns CrawlState
DELETE /v1/crawl/:id β handlers.CancelCrawl() β 204 No Content
| Method | Path | Description |
|---|---|---|
POST |
/v1/scrape |
Scrape a single URL with one or more output formats |
Scrape a single URL. This is the canonical endpoint for fetching and extracting content from one page β supports HTTP, browser (JS) rendering, content filters, and LLM-based structured extraction.
Request body (core.ScrapeRequest):
| Field | Type | Required | Description |
|---|---|---|---|
url |
string | yes | Absolute http:// or https:// URL to scrape |
formats |
string[] | no | Output formats. Any subset of markdown, html, rawHtml, plainText, links, imageLinks, json. Defaults to ["markdown"] |
renderMode |
string | no | Per-request override. One of "auto", "browser", "http". Omit to inherit the server-wide render_mode from quickcrawl.toml (default: auto). |
waitFor |
int | no | Milliseconds to wait after navigation for late content / XHRs. Range: 0-120000. Default: 0 |
headers |
object | no | Custom HTTP headers sent on the fetch |
includeTags |
string[] | no | CSS selectors to keep (e.g. ["article", "h1"]) β applied during preprocessHTML |
excludeTags |
string[] | no | CSS selectors to drop (e.g. ["nav", "footer", ".ad"]) |
cssSelector |
string | no | Extract content matching this CSS selector only |
jsonSchema |
object | no | JSON Schema used by formats:["json"] to constrain LLM extraction |
extract |
object | no | LLM extraction overrides: { schema, prompt, responseFormat } |
llmExtractionPrompt |
string | no | Per-request LLM system prompt override |
llmResponseFormat |
string | no | Per-request LLM response_format name override |
ttl |
int | no | Cache TTL in seconds. 0 = bypass cache, >0 = accept cached if younger than TTL. Must be >= 0 |
Minimal request:
{
"url": "https://example.com"
}Full request with filters and JS rendering:
{
"url": "https://example.com/article",
"formats": ["markdown", "html", "links"],
"renderMode": "browser",
"waitFor": 2000,
"headers": { "Cookie": "session=abc" },
"includeTags": ["article", "h1", "h2", "p"],
"excludeTags": ["nav", "footer", ".advertisement"]
}Response (200 OK on success):
{
"success": true,
"data": {
"markdown": "# Example Domain\n\nThis domain is for use in documentation examples...",
"html": "<h1>Example Domain</h1><p>This domain is for use in...</p>",
"plainText": "Example Domain This domain is for use in documentation examples...",
"links": ["https://www.iana.org/domains/example"],
"imageLinks": [],
"metadata": {
"title": "Example Domain",
"description": null,
"ogpTitle": null,
"ogpDescription": null,
"ogpImage": null,
"canonicalUrl": null,
"sourceURL": "https://example.com",
"language": "en",
"statusCode": 200,
"renderedMode": "http",
"timeTaken": 281
}
},
"warning": null
}renderedMode is "http" when fetched via the HTTP fetcher, or "browser"
when fetched via chromedp. See Metadata.renderedMode.
LLM-extraction response (when formats includes "json"):
{
"success": true,
"data": {
"markdown": "...",
"json": {
"title": "Example Domain",
"purpose": "documentation example"
},
"metadata": { "...": "..." }
}
}data.json is populated only when [extraction.llm] is configured in the
server TOML. See the LLM Extraction section below.
Error responses:
| Status | Code | Cause |
|---|---|---|
400 |
invalid_request |
Missing url, non-http(s) scheme, malformed JSON, or headers/includeTags/etc. of the wrong type |
400 |
forbidden |
crawler.respect_robots_txt=true and the page is disallowed |
500 |
internal_error |
Scraper not initialized |
200 with success:false |
http |
Target returned HTTP 4xx/5xx with a small body (surfaced as a soft failure rather than an HTTP error so callers can still inspect the metadata) |
200 with success:false |
renderer_error |
Browser path requested but no Chrome WS URL is configured (set [renderer.chrome].ws_url) |
| Method | Path | Description |
|---|---|---|
POST |
/v1/crawl |
Start an async BFS crawl of a website |
GET |
/v1/crawl/:id |
Check crawl status and retrieve results |
DELETE |
/v1/crawl/:id |
Cancel a running crawl job |
Start a BFS crawl from a seed URL. The job runs asynchronously β POST returns
a job ID immediately, and you poll GET /v1/crawl/:id for progress and results.
Request body (types.CrawlRequest):
| Field | Type | Required | Description |
|---|---|---|---|
url |
string | yes | Starting URL |
maxDepth |
int | no | Maximum link depth to follow. 0-100. Defaults to crawler.default_max_depth (TOML) |
maxPages |
int | no | Maximum pages to scrape. 1-100. Defaults to crawler.default_max_pages (TOML) |
formats |
string[] | no | Output formats per page. Any subset of markdown, html, rawHtml, plainText, links, imageLinks. Note: "json" is rejected with 400 β use /v1/scrape for LLM extraction |
renderMode |
string | no | Per-request override. One of "auto", "browser", "http". Omit to inherit the server-wide render_mode from quickcrawl.toml (default: auto). |
waitFor |
int | no | Milliseconds to wait after each navigation |
Start crawl request:
{
"url": "https://example.com",
"maxDepth": 2,
"maxPages": 50,
"formats": ["markdown", "links"],
"renderMode": "http"
}Start response (200 OK):
{
"success": true,
"id": "crawl-1748899200000000000"
}Check status β GET /v1/crawl/:id. No body required.
Status response while running:
{
"id": "crawl-1748899200000000000",
"success": true,
"status": "scraping",
"total": 47,
"completed": 12,
"data": []
}Status response when complete:
{
"id": "crawl-1748899200000000000",
"success": true,
"status": "completed",
"total": 47,
"completed": 47,
"data": [
{
"markdown": "# Example Domain\n\n...",
"html": null,
"plainText": null,
"links": ["https://www.iana.org/domains/example"],
"imageLinks": [],
"metadata": {
"sourceURL": "https://example.com",
"statusCode": 200,
"renderedMode": "http",
"timeTaken": 281
}
}
]
}status is one of pending, scraping, completed, failed.
When the crawl fails, error contains a human-readable message and
success is false.
Cancel β DELETE /v1/crawl/:id. No body. Returns 204 No Content on
success, 404 Not Found if the job ID is unknown.
Error responses for POST /v1/crawl:
| Status | Code | Cause |
|---|---|---|
400 |
invalid_request |
Missing url, non-http(s) scheme, malformed JSON, formats contains "json", maxDepth or maxPages out of range, negative values, invalid renderMode, or invalid waitFor |
500 |
internal_error |
Scraper not initialized |
| Method | Path | Description |
|---|---|---|
POST |
/v1/map |
Discover all URLs on a site instantly |
Request body (types.MapRequest):
| Field | Type | Required | Description |
|---|---|---|---|
url |
string | yes | Absolute http:// or https:// URL to start from |
maxDepth |
int | no | Maximum link depth to follow. 0-100. Default: 2 |
useSitemap |
bool | no | Seed discovery with sitemap.xml and robots.txt sitemaps. Default: true |
timeout |
int | no | Operation timeout in milliseconds. 1-600000. Default: 30000 |
Request example:
{
"url": "https://www.mabud.dev/",
"maxDepth": 2,
"useSitemap": true,
"timeout": 30000
}Response:
{
"success": true,
"data": {
"links": [
"https://www.mabud.dev/blog",
"https://www.mabud.dev/projects",
"https://www.mabud.dev/resume"
]
}
}| Method | Path | Description |
|---|---|---|
POST |
/v1/search |
Search SearXNG and optionally scrape results in parallel |
By default /v1/search returns only search-result metadata (title, URL, snippet). Set "scrape": true to also fetch and extract content (markdown/html/etc.) from each result URL β 10 workers in parallel.
Request body (types.SearchRequest):
| Field | Type | Required | Description |
|---|---|---|---|
query |
string | yes | Search query string |
region |
string | no | SearXNG language code. Default: auto |
timeRange |
string | no | SearXNG time_range filter. One of day, week, month, year. Omit for no filter |
categories |
string | no | Comma-separated SearXNG categories. Default: general |
page |
int | no | 1-based page number. 1-1000. Default: 1 |
use_bm25 |
bool | no | Re-rank results using BM25F scoring. Default: false |
scrape |
bool | no | Fetch and extract content from each result URL. Default: false |
formats |
string[] | no | Output formats when scrape=true. Any subset of markdown, html, rawHtml, plainText, links, imageLinks. Default: ["markdown"] |
renderMode |
string | no | Per-request render mode override for scraping each result. One of auto, browser, http. Omit to inherit server default |
Request example:
{
"query": "golang web scraping",
"timeRange": "month",
"page": 1,
"scrape": true,
"formats": ["markdown"]
}| Method | Path | Description |
|---|---|---|
GET |
/health |
Health check with browser availability and active job count |
Quickcrawl supports JSON-Schema-based structured extraction for /v1/scrape.
Add "json" to the formats array and supply a jsonSchema (or extract.schema)
describing the shape you want. The page's markdown is sent to the LLM along
with the schema, and the response is returned in data.json.
Request:
{
"url": "https://news.example.com/article",
"formats": ["markdown", "json"],
"jsonSchema": {
"type": "object",
"properties": {
"title": { "type": "string" },
"author": { "type": "string" },
"published": { "type": "string" }
},
"required": ["title", "author"]
},
"extract": {
"prompt": "Extract article title, author, and publish date",
"responseFormat": "article"
}
}Response (200 OK):
{
"success": true,
"data": {
"markdown": "# Headline\n\nBy Jane Doe. Published 2024-01-15...",
"json": {
"title": "Headline",
"author": "Jane Doe",
"published": "2024-01-15"
},
"metadata": { "...": "..." }
}
}Field reference:
| Field | Type | Description |
|---|---|---|
jsonSchema |
object | Top-level shortcut for extract.schema β JSON Schema for the data you want extracted |
extract.schema |
object | Same as jsonSchema (nested form) |
extract.prompt |
string | Per-request system prompt override (otherwise [extraction.llm].extraction_prompt from the server TOML is used) |
extract.responseFormat |
string | OpenAI response_format.name for the structured output. Defaults to "extracted_data" |
llmExtractionPrompt |
string | Top-level shortcut for extract.prompt |
llmResponseFormat |
string | Top-level shortcut for extract.responseFormat |
Server-side configuration (quickcrawl.toml):
[extraction.llm]
api_key = "" # or set EXTRACTION__LLM__API_KEY in the env
model = "gpt-4o-mini"
base_url = "" # override for non-OpenAI endpoints
max_tokens = 8192
extraction_prompt = "You are a data extraction assistant..."
response_format = "extracted_data"The LLM is only invoked when formats includes "json". If formats:["json"]
is requested but [extraction.llm] is not configured, the scrape returns
{"success": false, "errorCode": "extraction_error", "error": "json extraction requested but no LLM configured. Set [extraction.llm] in server config."}.
Config file: quickcrawl.toml
[server]
host = "0.0.0.0"
port = 3000
rate_limit_rps = 10
[renderer]
page_timeout_ms = 30000
pool_size = 4
[renderer.chrome]
ws_url = ""
[crawler]
max_concurrency = 40
requests_per_second = 40.0
respect_robots_txt = true
default_max_depth = 2
default_max_pages = 100
[extraction.llm]
model = "gpt-4o-mini"
api_key = ""
base_url = ""
max_tokens = 8192
extraction_prompt = "You are a data extraction assistant..."
response_format = "extracted_data"Or via environment variables:
SERVER__PORT=3000
RENDERER__CHROME__WS_URL=ws://127.0.0.1:9222/devtools/browser/...
CRAWLER__MAX_CONCURRENCY=40
EXTRACTION__LLM__API_KEY=your-keyquickcrawl/
βββ cmd/
β βββ server/ # HTTP API server
β βββ mcp/ # MCP server for AI agents
βββ cli/ # Standalone CLI binary
β βββ main.go # Entry point
β βββ cmd/ # Cobra subcommands
βββ internal/
β βββ api/ # HTTP handlers, routes, middleware
β βββ crawler/ # BFS crawler, robots.txt, sitemap
β βββ extractor/ # HTML cleaning, markdown conversion, link extraction
β βββ renderer/ # HTTP, browser fetching via CDP
β βββ search/ # SearXNG integration
β βββ mcp/ # MCP tool implementation
β βββ types/ # Type definitions
βββ playground/ # Web UI playground
βββ npm/ # NPM wrapper package
βββ python/ # Python SDK
β βββ examples/ # Python usage examples
β βββ README.md # Python SDK documentation
βββ bench/ # Benchmarks
βββ scripts/ # Release scripts
βββ workflows/ # CI/CD workflows
| Tech | Use Case |
|---|---|
| Go 1.21+ | Core backend, CLI, server |
| Gin | HTTP framework |
| Cobra | CLI framework |
| goquery | HTML parsing and DOM manipulation |
| lightpanda | Headless browser automation (CDP over WebSocket) |
| Chrome DevTools | Browser automation via CDP WebSocket |
| MCP SDK | Model Context Protocol server |
| slog | Structured logging |
Quickcrawl includes a web-based playground for testing:
playground/
βββ app/
β βββ playground/
β βββ page.tsx # Main playground UI
βββ components/
β βββ response-viewer.tsx # Response display components
βββ lib/
βββ api-client.ts # API client functions
Access at http://localhost:3000/playground when the server is running.
Docker images are published to GitHub Container Registry:
# Server
docker build -f infra/Dockerfile.server -t quickcrawl .
docker run -p 3000:3000 quickcrawl
# Playground
docker build -f infra/Dockerfile.playground -t quickcrawl-playground .
docker run -p 3000:3000 quickcrawl-playground- Click the button above
- Connect your GitHub repository
- Configure environment variables
- Deploy
- Add Page Interaction
- Hooks & Auth
- Improve Search
- Better SPA Handling
- Auto mode improvements for JS rendering
- Add support for https://github.com/h4ckf0r0day/obscura headless
- Focus on scalablity
- Cache System
- Improve the SDK perfomance