Skip to content

Turn any company URL into a strategic intelligence brief. AI-powered scraping + Gemini Deep Research for consultants and analysts.

License

Notifications You must be signed in to change notification settings

blisspixel/primr

Primr

Turn a company URL into a cited, analyst-grade intelligence brief.

Primr extracts primary-source data from company websites using a multi-tier approach that adapts to different site architectures, then synthesizes external research into structured briefs that can be consumed by humans or autonomous agents.

Runs as a CLI, an MCP server, an OpenClaw integration, and a Claude Skill.

primr "Acme Corp" https://acme.example

30 minutes later: competitive positioning, technology stack, strategic initiatives, and external validation—all cited.

Why This Exists

Company research is tedious. You visit the website, click around, Google the company, read articles, synthesize it all, write it up. Repeat for every prospect, every deal, every meeting.

Primr does that entire workflow autonomously.

What Makes It Different

  • Adaptive scraping: 8 retrieval methods from browser rendering to TLS fingerprinting to screenshot+vision extraction, with per-host optimization. Starts with full browser rendering (what works on 95%+ of modern sites) and falls back through increasingly specialized methods.
  • Autonomous external research: Gemini Deep Research plans queries, follows leads, cross-validates sources, and synthesizes findings into a structured brief.
  • Cost controls built in: --dry-run estimates, usage tracking, and governance hooks for budget limits.
  • Agent-native interfaces: CLI, MCP server, OpenClaw integration, and Claude Skills—all first-class.

Manual research takes hours. Primr typically runs in ~30 minutes and costs ~$1–2 in API usage (varies by depth and site behavior).

Modes

Mode What it does Time Cost
scrape Crawls site, extracts insights ~5 min ~$0.10
deep Gemini Deep Research on external sources ~10 min ~$1.00
full Both combined into comprehensive brief ~30 min ~$1.50

Costs are primarily Gemini API usage. Web search is free (DuckDuckGo). Use --dry-run for accurate estimates based on your usage history.

Quick Start

git clone https://github.com/blisspixel/primr.git
cd primr
python setup_env.py              # Installs deps, creates .env
# Add your API keys to .env (see docs/API_KEYS.md)
primr doctor                     # Verify everything works
primr "Acme Corp" https://acme.example  # Run your first research

Requires Python 3.11+ and a Gemini API key. That's it — web search uses DuckDuckGo (no key needed).

# More usage
primr "Company" https://company.com --mode scrape        # Site corpus only
primr "Company" https://company.com --mode deep          # External research only
primr "Company" https://company.com --dry-run            # Cost estimate first
primr "Company" https://company.com --cloud-vendor aws azure  # Multi-vendor AI strategy

What a run looks like

▸ PHASE 1 · Data Collection
  Website scraping + web search + AI analysis

✓ 251 links → 50 selected
Scraping 23/50 /about  [15s elapsed, ~2m left]
✓ 48/50 pages scraped (6m 10s)
+ 3 external sources validated
✓ Data Collection
  Sections generated: 18

▸ PHASE 2 · Deep Research
  Comprehensive report with sequential elaboration (50+ pages)

  Searching sources (1m 33s)
  Analyzing findings (3m 48s)
  Generating report (6m 43s)
  Writing: Executive Summary (1/21)...
  Writing: Products and Services (2/21)...
  ...
  Writing: Strategic Positioning Hypothesis (21/21)...

✓ Deep Research
  Chapters: 21

▸ PHASE 3 · AI Strategy Roadmap (AWS) Analysis
  Generating AI strategy roadmap recommendations (aws)

✓ AI Strategy Roadmap (AWS) Analysis

▸ PHASE 4 · AI Strategy Roadmap (AZURE) Analysis
  Generating AI strategy roadmap recommendations (azure)

✓ AI Strategy Roadmap (AZURE) Analysis

✓ Complete in 85m

✓ Report ready
  output/Acme_Corp_Strategic_Overview_02-11-2026.docx

✓ AI Strategy Roadmap (AWS)
  output/Acme_Corp_AI_Strategy_AWS_02-11-2026.docx

✓ AI Strategy Roadmap (AZURE)
  output/Acme_Corp_AI_Strategy_AZURE_02-11-2026.docx

Mode: Complete (Two-Step)
Chapters: 21
Citations: 34
Duration: 85m
Est. Cost: $1.92
Actual Cost: ~$0.62
AI Strategy: Yes

What the output looks like

From the executive summary of a sample report:

Cirrus Fleet Technologies is a mid-market logistics optimization vendor ($180-220M ARR, estimated) that sells route planning and fleet analytics software to regional shipping companies. The company occupies a defensible but narrowing niche: optimizing last-mile delivery for carriers still running legacy dispatch systems.

Key insights:

  • Cirrus's customer concentration is high. Cross-referencing case studies, press releases, and conference presentations, roughly 40% of referenced deployments involve just 3 carrier networks. Loss of any one would be material. [Confidence: Inferred]
  • The company has no disclosed AI strategy, but 4 of their last 7 engineering hires have ML/optimization backgrounds. Combined with a patent filing for "autonomous route replanning under disruption," this suggests an unannounced product line. [Confidence: Inferred]
  • Pricing has shifted from perpetual licenses to consumption-based billing (per-shipment), visible in public procurement portal RFP responses. [Confidence: Reported]

Reports include 20+ structured chapters, SWOT analysis, competitive landscape, discovery questions, and inline confidence levels on every non-obvious claim. Full sample: docs/examples/sample-brief.md

Batch Research

Have a spreadsheet of companies? Primr can enrich it with website URLs and run research across the list.

Two-step workflow (recommended):

# Step 1: Enrich — auto-detect columns, look up websites, filter by industry, save CSV
primr --batch companies.xlsx --industry Utilities --enrich

# Step 2: Review the enriched CSV, then run research
primr --batch companies_utilities_enriched.csv --mode scrape

Options:

--enrich          # Enrich only — look up websites, save CSV, don't research
--industry NAME   # Filter rows by industry column value
--limit N         # Process only the first N companies (useful for testing)
--skip-confirm    # Skip the confirmation prompt (for unattended runs)
--mode MODE       # scrape (~$0.10/co), deep (~$1.00/co), full (~$1.50/co)

Defensive behavior:

  • Shows cost estimate and asks for confirmation before starting (use --skip-confirm to bypass)
  • Resume: re-run the same command to skip companies that already have reports from today
  • Cooldown between companies (10s for scrape, 60s for deep/full) to avoid API quota issues
  • Progressive retry with backoff on rate-limit errors (immediate → 2 min → 5 min)
  • Pauses and asks after 3 consecutive failures — option to wait 10 minutes or stop
  • Deduplicates companies by name (case-insensitive)

Accepts Excel (.xlsx) or CSV files. Smart column detection uses an LLM to find company name, website, and industry columns automatically.

Under the Hood

8-Tier Retrieval Engine (browser-first for modern JS-heavy sites)

  • Browser tiers: Playwright → expanded rendering → DrissionPage stealth → DrissionPage (driverless CDP)
  • Vision tier: Screenshot + LLM extraction for image-heavy or non-standard layouts
  • HTTP tiers: curl_cffi (TLS fingerprinting) → httpx → requests
  • Content-type routing: automatic PDF detection and LLM-powered extraction
  • Automatic fallback, per-host optimization, circuit breakers

Gemini Deep Research

  • Autonomous multi-step search and synthesis
  • Plans its own research strategy, follows leads, validates across sources
  • Not a wrapper around chat completions—actual agentic research

Agentic Architecture

  • Hypothesis tracking with confidence levels across sessions
  • Subagents for scraping, analysis, writing, and QA
  • Hook system for governance (cost limits, quality gates)
  • Research memory that persists and evolves

Configuration

# Required in .env
GEMINI_API_KEY=       # https://aistudio.google.com/apikey

# Optional — only needed if you want to use Google Custom Search instead of DuckDuckGo
# SEARCH_PROVIDER=google
# SEARCH_API_KEY=     # Google Custom Search API
# SEARCH_ENGINE_ID=   # Programmable Search Engine ID

Web search uses DuckDuckGo by default — no search API key needed. Google Custom Search is available as an optional fallback for users with existing whole-web CSEs.

Full setup guide

Agent Integration

Primr is built for the agentic era. Three ways to plug it in:

MCP Server — Claude Desktop, Cursor, and any MCP-compatible client:

primr-mcp --stdio              # stdio transport
primr-mcp --http --port 8000   # HTTP with JWT auth
OpenClaw — Drop-in integration with skills and workflows
# openclaw/openclaw.json already configured
# Skills: primr-research, primr-strategy, primr-qa
# Sandboxed Docker execution included
Claude Skills — Anthropic's Agent Skills format
skills/
├── company-research/SKILL.md   # Full pipeline with memory
├── hypothesis-tracking/SKILL.md # Confidence management
├── qa-iteration/SKILL.md       # Section refinement
└── scrape-strategy/SKILL.md    # Tier selection heuristics

Skills include hypothesis persistence, cost governance hooks, and QA gates. Agents can pick up where they left off across sessions.

Cloud Deployment — Serverless on AWS, Azure, or GCP

Scale-to-zero ephemeral containers, event-driven queues, production observability. See deployment guide.

MCP docs · OpenClaw config

Development

python -m pytest tests/ -x --tb=short   # Run tests
ruff check src/                          # Lint
mypy src/primr --ignore-missing-imports  # Type check

4,400+ tests including property-based testing (Hypothesis), full ruff and mypy compliance, OpenTelemetry tracing, and typed error hierarchy with automatic retry classification. CI runs lint, type check, and tests on every push via GitHub Actions.

Documentation

Doc What's in it
ARCHITECTURE.md System design, data flow, scraping tiers
API.md MCP server, programmatic usage
CONFIG.md Full configuration reference
API_KEYS.md API key setup
CLOUD_DEPLOYMENT.md Serverless deployment
SECURITY_OPS.md Security operations guide
CONTRIBUTING.md Contribution guidelines
SECURITY.md Vulnerability reporting
ROADMAP.md What's planned

About This Project

Primr is a nights-and-weekends project by a solo developer. I think AI-assisted research workflows are going to be transformative over the next few years, and this is my way of building deeply in the space — learning by shipping something real.

It's not backed by a company or a team. It's a passion project, not a commercial product.

Disclaimer

Primr is a research tool. You are responsible for:

  • Web content: Primr retrieves publicly available web content, similar to a browser or search engine crawler. It does not bypass authentication, access paywalled content, or exploit vulnerabilities. However, some websites restrict automated access in their terms of service — it is your responsibility to check before running Primr against any site.
  • Accuracy: AI-generated content may contain errors, hallucinations, or outdated information. Verify findings before acting on them.
  • Costs: API calls to Gemini and other services incur real charges. Use --dry-run to estimate costs before running.
  • Use case: This tool is intended for legitimate research purposes such as due diligence and meeting preparation. Do not use it to violate any website's terms of service or any applicable law.

This software is provided as-is by a solo developer. The author is not liable for how you use this software, the accuracy of its outputs, or any consequences of its use.

License

MIT

About

Turn any company URL into a strategic intelligence brief. AI-powered scraping + Gemini Deep Research for consultants and analysts.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages