Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.
-
Updated
Jun 27, 2026 - Rust
Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.
Use LLMs to robustly extract web data
Fully automated and hands-free, accurately extracting and understanding web content — powered by machine learning agents.
Low-Cost Cross-Domain Web Structured Information Extraction using specialized LoRA adapters.
Replayable Browser Agent
基于Scala Akka的分布式主题网络爬虫
Automatic extraction of the information on local event from a webpage with Machine Learning
A powerful and lightweight web scraping library with LLM extraction capabilities. This library combines web scraping with AI-powered content extraction using either OpenAI or OpenRouter APIs.
Predicting product recommendation score using the data available on the website of the client
Free local web search/extraction router for AI agents. Go CLI + MCP, BYOK/free-first routing, keyless DDGS/Scrapling fallback, setup writers and client guides.
Self-hosted web scraping and Markdown extraction for AI agents
Programming assignments for Web Information Extraction and Retrieval, FRI UL, 2021. PA1: standalone webcrawler of .gov.si web sites, PA2: approaches of the structured web data extraction, PA3: Data processing and indexing and Data retrieval.
MarkGrab plugin for Claude Code — web content extraction to LLM-ready markdown
Structured web-extraction tool plugin with schema, provenance, and drift awareness.
This project is a command-line tool that extracts text from web pages and PDF files, including scanned documents. It supports various extraction methods. This tool is ideal for data scraping, NLP preprocessing, and content analysis.
pinterest data extraction toolkit
Giving Agent Eyes to see and interact with internet. Agent Eye is an MCP server that wraps Playwright and exposes browser tools for local AI agents
Local-first search tool layer for AI agents, built with FastAPI, SearXNG, and Trafilatura.
Add a description, image, and links to the web-extraction topic page so that developers can more easily learn about it.
To associate your repository with the web-extraction topic, visit your repo's landing page and select "manage topics."