Self-hosted web search for AI agents β zero API key, full privacy
Works in OpenClaw Β· Claude Code Β· Antigravity Β· Any CLI
Your AI agent wants to search the web, but:
- Brave Search API: $3/1000 queries, rate limited
- Google Custom Search: $5/1000, daily caps
- Bing API: paid, complex setup
- Built-in web search: sends queries to third-party servers
You just want your local agent to search Google without paying or leaking queries.
ask-search wraps SearxNG β a self-hosted meta search engine that aggregates Google, Bing, DuckDuckGo, Brave and 70+ more sources. One command, all results, zero cost.
ask-search "Claude Code vs Cursor 2026"[1] Claude Code Is Eating Cursor's Lunch
https://techcrunch.com/2026/...
After Anthropic launched Claude Code with agent mode...
[google,brave]
[2] Why I switched from Cursor to Claude Code
https://reddit.com/r/LocalLLaMA/...
...
| Environment | Integration | Status |
|---|---|---|
| OpenClaw | CLI Skill (SKILL.md) |
β |
| Claude Code | CLI command | β |
| Antigravity | MCP Server | β |
| Any shell | ask-search CLI |
β |
git clone https://github.com/ythx-101/ask-search
cd ask-search
bash install.sh
ask-search "hello world"Step 1: Deploy SearxNG
# Docker (recommended)
docker run -d --name searxng \
-p 127.0.0.1:8080:8080 \
-e SEARXNG_SECRET_KEY=your-secret-key \
searxng/searxng
# Or Docker Compose β see searxng/docker-compose.yml in this repoStep 2: Enable JSON output
Edit SearxNG settings.yml:
search:
formats:
- html
- jsonStep 3: Install ask-search
bash install.shStep 4: Use it
ask-search "your query"ask-search "query" # top 10 results
ask-search "query" --num 5 # limit results
ask-search "AI news" --categories news # news only
ask-search "query" --lang zh-CN # language filter
ask-search "query" --urls-only # URLs only (pipe to web_fetch)
ask-search "query" --json # raw JSON
ask-search "query" -e google,brave # specific engines| Variable | Default | Description |
|---|---|---|
SEARXNG_URL |
http://localhost:8080 |
SearxNG endpoint |
export SEARXNG_URL="http://localhost:8080"
ask-search "query"{
"mcpServers": {
"ask-search": {
"command": "python3",
"args": ["/path/to/ask-search/mcp/server.py"],
"env": {
"SEARXNG_URL": "http://localhost:8080"
}
}
}
}Requires: pip install mcp
Copy SKILL.md to your OpenClaw skills directory, or:
# OpenClaw skill loader
skill-add https://github.com/ythx-101/ask-searchThen in OpenClaw:
ask-search "latest news about X"
# 1. Search
ask-search "React Server Components performance 2026" --num 10
# 2. Got a promising URL? Deep-dive:
# Pass the URL to web_fetch / curl for full content
# 3. News mode
ask-search "GPT-5 release" --categories news --lang enask-search/
βββ scripts/
β βββ core.py # Main logic, CLI entry point
βββ mcp/
β βββ server.py # MCP server for AG/CC integration
βββ install.sh # Installer
βββ SKILL.md # OpenClaw skill descriptor
βββ README.md
- Must enable
jsoninsearch.formatsinsettings.yml - Bot detection may block requests from some IPs β add your server IP to
pass_ipinlimiter.toml - Bind to
127.0.0.1for security unless you need remote access
ask-search returns URLs + snippets from search engine indexes. When your agent needs the full page content (deep-dive via curl / web_fetch), some sites will block depending on your server's network environment:
| Site | Search (SearxNG) | Deep-dive (curl/fetch) | Why |
|---|---|---|---|
| Most sites | β | β | No aggressive anti-bot |
| β | β VPS IP blocked | Reddit blocks datacenter IPs | |
| Zhihu (η₯δΉ) | β | β Login wall + fingerprint | Requires browser JS + login |
| Medium | β | Partial content only |
Key insight: Search always works because SearxNG queries search engines (Google, Brave, etc.), not the target sites directly. The search engines have already indexed the content. The problem only appears when your agent tries to fetch the full page.
If you have a machine on a residential network (home server, laptop, etc.), create an SSH SOCKS tunnel:
# On your VPS/server:
ssh -f -N -D 127.0.0.1:1082 user@your-home-machine
# Then fetch through the proxy:
curl -x socks5h://127.0.0.1:1082 "https://reddit.com/r/example/comments/xxx.json"For Reddit specifically, append .json to any post URL for structured data:
# Returns full post + all comments as JSON
curl -x socks5h://127.0.0.1:1082 \
"https://www.reddit.com/r/LocalLLaMA/comments/xxxxx/post_title.json"To persist the tunnel as a systemd service:
# /etc/systemd/system/socks-proxy.service
[Unit]
Description=SSH SOCKS Proxy for web scraping
After=network.target
[Service]
Type=simple
ExecStart=/usr/bin/ssh -N -D 127.0.0.1:1082 -o ServerAliveInterval=30 -o ServerAliveCountMax=3 -o ExitOnForwardFailure=yes user@your-home-machine
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.targetSites like Zhihu require full browser rendering. Use Playwright with the SOCKS proxy:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
proxy={"server": "socks5://127.0.0.1:1082"}
)
page = browser.new_page()
page.goto("https://example.com/article")
content = page.inner_text("article")Note: Some sites (Zhihu, etc.) detect headless browsers even through proxies. For these, you may need a real browser session with login cookies, or delegate the fetch to an agent running on a local machine (e.g., Claude Code on a Mac).
When direct access fails, try cached versions:
# Archive.org (works surprisingly often for Reddit)
curl "https://web.archive.org/web/2026/https://reddit.com/r/example/comments/xxx"
# Google Cache (may redirect β not always reliable)
curl "https://webcache.googleusercontent.com/search?q=cache:example.com/page"If you run agents on multiple machines (e.g., OpenClaw on VPS + Claude Code on local Mac):
VPS agent: ask-search "query" β gets URLs + snippets
β
Local agent: web_fetch(url) β full content (residential IP, no blocks)
β
VPS agent: receives full text, analyzes, responds
This is the most robust approach β search on your server, deep-dive from a local machine where anti-bot measures don't apply.
| Problem | Fix |
|---|---|
| Reddit blocks your IP | SSH SOCKS proxy + .json API |
| Site needs JS rendering | Playwright + proxy |
| Site needs login (Zhihu) | Delegate to local agent or use logged-in browser |
| Everything blocked | Fall back to search snippets + archive caches |
Issues and PRs welcome.
Inspired by Perplexica β the idea of using self-hosted SearxNG as a private search backend comes from there. ask-search strips it down to a single CLI command for agent use.