Fetch web pages and extract structured content via Plasmate using plain shell scripts.
No runtime dependencies beyond plasmate and jq.
A minimal template showing how to use Plasmate from shell scripts. Fetch web pages and get back a structured Semantic Object Model (SOM) instead of raw HTML — then slice it with jq.
Install Plasmate:
curl -fsSL https://plasmate.app/install.sh | shInstall jq (used for JSON parsing):
# macOS
brew install jq
# Ubuntu/Debian
apt-get install jq
# Or download from https://jqlang.github.io/jq/download/| Script | Description |
|---|---|
fetch-page.sh |
Fetch a single URL and print the semantic content |
batch-fetch.sh |
Fetch multiple URLs and save results as JSON |
extract-links.sh |
Extract all links from a page using the SOM |
# Clone this template
gh repo create my-scraper --template dbhurley/quickstart-shell --clone
cd my-scraper
# Make scripts executable
chmod +x *.sh
# Fetch a page
./fetch-page.sh https://news.ycombinator.com
# Extract links
./extract-links.sh https://github.com/trending
# Extract links and save as JSON
./extract-links.sh https://github.com/trending --json
# Batch fetch (saves to results.json)
./batch-fetch.sh https://example.com https://example.org https://github.comPlasmate fetches web pages and returns a Semantic Object Model — structured JSON organized by semantic regions (nav, main, sidebar) and elements (headings, links, text).
# Raw SOM
plasmate fetch https://example.com
# Extract the title
plasmate fetch https://example.com | jq -r '.title'
# List all headings
plasmate fetch https://example.com | jq -r '.regions[].elements[] | select(.role == "heading") | .text'
# Extract all links with their region context
plasmate fetch https://example.com | jq '[
.regions[] | . as $r |
.elements[] |
select(.role == "link" and .href) |
{region: $r.role, text: .text, href: .href}
]'{
"title": "Example Domain",
"lang": "en",
"regions": [
{
"role": "main",
"id": "content",
"elements": [
{"role": "heading", "text": "Example Domain", "level": 1},
{"role": "text", "text": "This domain is for use in illustrative examples..."},
{"role": "link", "text": "More information...", "href": "https://www.iana.org/domains/example"}
]
}
]
}# Count elements by role
plasmate fetch <url> | jq '[.regions[].elements[].role] | group_by(.) | map({(.[0]): length}) | add'
# Extract all text content
plasmate fetch <url> | jq -r '.regions[].elements[] | select(.role == "text") | .text'
# Find elements containing a keyword
plasmate fetch <url> | jq -r '.regions[].elements[] | select(.text | test("keyword"; "i")) | .text'
# Get structured data (JSON-LD, OpenGraph)
plasmate fetch <url> | jq '.structured_data'
# Get interactive elements (for click/type automation)
plasmate fetch <url> | jq '.interactive'MIT
| Engine | plasmate - The browser engine for agents |
| MCP | plasmate-mcp - Claude Code, Cursor, Windsurf |
| Extension | plasmate-extension - Chrome cookie export |
| SDKs | Python / Node.js / Go / Rust |
| Frameworks | LangChain / CrewAI / AutoGen / Smolagents |
| Tools | Scrapy / Audit / A11y / GitHub Action |
| Resources | Awesome Plasmate / Notebooks / Benchmarks |
| Docs | docs.plasmate.app |
