Scrappus 👾

A flexible, Dockerized web scraper that supports both static and dynamic websites using Selenium (Chrome or Firefox). Extraction is rule-based via JSON, and supports multiple output formats like JSON and CSV. Scrappus is developed in python. If you want to learn about web scrapping you can check this article, you need to know basics or HTML5 tags to define the rules.json file who includes the specifications to start getting the data from the target.

📦 Features

Supports static and JavaScript-rendered (dynamic) websites
Rule-based scraping via JSON file
Output to JSON or CSV
Dockerized for easy setup
Supports Chrome and Firefox headless modes

🚀 Getting Started

1. Clone the repository

git clone https://github.com/kur0bai/scrappus.git
cd scrappus

2. Create your rules

Feed the rules.json file with the required or desired rules to extract data, for example:

{
	"title": "h3",
	"description": "meta[name='description']",
	"modules": "div[class='container']",
	"links": "a[href]"
}

3. Run in Docker or traditional way

Docker:

Create an image: docker build -t scrappus .
Execute it: docker run --rm scrappus "https://example.com" "rules.json" --output output.json --dynamic

Traditional:

Install requirements (python virtual enviroment recommended): pip install requirements.txt
Run the script using: python3 main.py to see the help commands.

Easy right? 🍥 You should see the results on the output file you defined before.

Important: 📌

I want to clarify that this tool is open to modifications if necessary for the usefulness of those who are interested. However, I am not responsible if it is used for malicious purposes, as it is not the idea. Use it under your own responsibility. Without further ado I hope this can help your projects.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
rules.json		rules.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrappus 👾

📦 Features

🚀 Getting Started

1. Clone the repository

2. Create your rules

3. Run in Docker or traditional way

Important: 📌

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scrappus 👾

📦 Features

🚀 Getting Started

1. Clone the repository

2. Create your rules

3. Run in Docker or traditional way

Important: 📌

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages