Skip to content

plasmate-labs/quickstart-shell

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Plasmate

Plasmate Quickstart — Shell

Fetch web pages and extract structured content via Plasmate using plain shell scripts.
No runtime dependencies beyond plasmate and jq.

Test License


A minimal template showing how to use Plasmate from shell scripts. Fetch web pages and get back a structured Semantic Object Model (SOM) instead of raw HTML — then slice it with jq.

Prerequisites

Install Plasmate:

curl -fsSL https://plasmate.app/install.sh | sh

Install jq (used for JSON parsing):

# macOS
brew install jq

# Ubuntu/Debian
apt-get install jq

# Or download from https://jqlang.github.io/jq/download/

What's Included

Script Description
fetch-page.sh Fetch a single URL and print the semantic content
batch-fetch.sh Fetch multiple URLs and save results as JSON
extract-links.sh Extract all links from a page using the SOM

Quick Start

# Clone this template
gh repo create my-scraper --template dbhurley/quickstart-shell --clone
cd my-scraper

# Make scripts executable
chmod +x *.sh

# Fetch a page
./fetch-page.sh https://news.ycombinator.com

# Extract links
./extract-links.sh https://github.com/trending

# Extract links and save as JSON
./extract-links.sh https://github.com/trending --json

# Batch fetch (saves to results.json)
./batch-fetch.sh https://example.com https://example.org https://github.com

How It Works

Plasmate fetches web pages and returns a Semantic Object Model — structured JSON organized by semantic regions (nav, main, sidebar) and elements (headings, links, text).

# Raw SOM
plasmate fetch https://example.com

# Extract the title
plasmate fetch https://example.com | jq -r '.title'

# List all headings
plasmate fetch https://example.com | jq -r '.regions[].elements[] | select(.role == "heading") | .text'

# Extract all links with their region context
plasmate fetch https://example.com | jq '[
  .regions[] | . as $r |
  .elements[] |
  select(.role == "link" and .href) |
  {region: $r.role, text: .text, href: .href}
]'

SOM Structure

{
  "title": "Example Domain",
  "lang": "en",
  "regions": [
    {
      "role": "main",
      "id": "content",
      "elements": [
        {"role": "heading", "text": "Example Domain", "level": 1},
        {"role": "text",    "text": "This domain is for use in illustrative examples..."},
        {"role": "link",    "text": "More information...", "href": "https://www.iana.org/domains/example"}
      ]
    }
  ]
}

Useful jq Recipes

# Count elements by role
plasmate fetch <url> | jq '[.regions[].elements[].role] | group_by(.) | map({(.[0]): length}) | add'

# Extract all text content
plasmate fetch <url> | jq -r '.regions[].elements[] | select(.role == "text") | .text'

# Find elements containing a keyword
plasmate fetch <url> | jq -r '.regions[].elements[] | select(.text | test("keyword"; "i")) | .text'

# Get structured data (JSON-LD, OpenGraph)
plasmate fetch <url> | jq '.structured_data'

# Get interactive elements (for click/type automation)
plasmate fetch <url> | jq '.interactive'

License

MIT


Part of the Plasmate Ecosystem

Engine plasmate - The browser engine for agents
MCP plasmate-mcp - Claude Code, Cursor, Windsurf
Extension plasmate-extension - Chrome cookie export
SDKs Python / Node.js / Go / Rust
Frameworks LangChain / CrewAI / AutoGen / Smolagents
Tools Scrapy / Audit / A11y / GitHub Action
Resources Awesome Plasmate / Notebooks / Benchmarks
Docs docs.plasmate.app

About

Plasmate Quickstart — Shell. Fetch web pages and extract structured SOM content using plasmate + jq.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages