GitHub - yapishu/icewalk: like firecrawl.dev but free

IceWalk

A concurrent web crawler that outputs content in markdown format for LLMs. It uses Selenium for JavaScript-rendered content and supports depth-limited crawling.

Features

Concurrent crawling using ThreadPoolExecutor
Selenium support for JavaScript-rendered content
Depth-limited crawling (configurable)
Extracts metadata (title, description, language)
Converts HTML to Markdown
Respects same-domain policy

Usage

python3 crawl.py <url> [--max-depth <depth>] [--timeout <seconds>]

Options:

<url>: Starting URL for the crawler
--max-depth: Maximum depth for crawling (default: 3, use -1 for unlimited depth)
--timeout: Timeout for each request in seconds (default: 30)

Example:

python3 crawl.py https://example.com --max-depth 5 --timeout 45

Output

The crawler generates a single Markdown file named after the domain (e.g., example.com.md). Each crawled page is represented as a section in the Markdown file, including:

Page title
Source URL
Language
Description (if available)
Main content in Markdown format

Requirements

Python 3.x
Required Python packages: requests, beautifulsoup4, html2text, selenium, webdriver_manager

Install dependencies:

pip install -r requirements.txt

Note: Make sure you have Chrome installed, as the crawler uses ChromeDriver for Selenium.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
app.py		app.py
crawl.py		crawl.py
docker-compose.yml		docker-compose.yml
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IceWalk

Features

Usage

Output

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IceWalk

Features

Usage

Output

Requirements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages