Social Media Scraping Application

A comprehensive Laravel-based web application for scraping social media platforms using Python as an external scraping engine. This application provides a robust, scalable, and user-friendly interface for extracting data from popular social media platforms.

🚀 Features

Supported Platforms

Twitter/X - User profiles, tweets, hashtags
Instagram - User profiles, posts, hashtags
Facebook - Page posts, group posts
LinkedIn - User profiles, company posts
TikTok - User profiles, videos, hashtags
YouTube - Channel videos, search results, comments

Core Features

Multi-Platform Support - Scrape data from 6 major social media platforms
Anti-Bot Measures - User agent rotation, proxy support, random delays
Modular Architecture - Clean separation between Laravel backend and Python scraping engine
Real-time Logging - Comprehensive logging system for monitoring scraping operations
Database Storage - MySQL database for storing scraping results and metadata
Export Options - Export results in JSON, CSV, or Excel formats
Rate Limiting - Built-in rate limiting to respect platform policies
Error Handling - Robust error handling and retry mechanisms

🏗️ Architecture

Laravel Backend (PHP)
├── Controllers
│   ├── SocialMediaController.php
│   └── WebScrapingController.php
├── Services
│   ├── SocialMediaScrapingService.php
│   └── PythonScrapingService.php
└── Routes & Views

Python Scraping Engine
├── scrapers/
│   ├── social_media_scrapers.py
│   ├── base_scraper.py
│   ├── requests_scraper.py
│   ├── selenium_scraper.py
│   ├── playwright_scraper.py
│   └── scrapy_spider.py
├── utils/
│   ├── logger.py
│   └── anti_bot.py
├── database/
│   └── mysql_handler.py
└── main.py

📋 Requirements

System Requirements

PHP 8.2+
Python 3.8+
MySQL 8.0+
Composer
Node.js & NPM

Python Dependencies

requests
beautifulsoup4
selenium
playwright
scrapy
pandas
fake-useragent
loguru
pymysql

🛠️ Installation

1. Clone the Repository

git clone <repository-url>
cd scrape-web

2. Install PHP Dependencies

composer install

3. Install Node.js Dependencies

npm install

4. Setup Environment

cp .env.example .env
php artisan key:generate

5. Configure Database

Update your .env file with database credentials:

DB_CONNECTION=mysql
DB_HOST=127.0.0.1
DB_PORT=3306
DB_DATABASE=laravel
DB_USERNAME=laravel
DB_PASSWORD=laravel

PYTHON_PATH=python3

6. Setup Python Environment

cd python_scraper
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

7. Install Browser Drivers

# For Selenium
pip install webdriver-manager

# For Playwright
playwright install

8. Run Database Migrations

php artisan migrate

9. Start the Application

# Start Laravel development server
php artisan serve

# In another terminal, start the queue worker (optional)
php artisan queue:work

🎯 Usage

Web Interface

Access the Application
- Navigate to http://localhost:8000
- You'll be redirected to the social media scraping dashboard
Scrape User Profiles
- Go to "Profile Scraping"
- Select platform (Twitter, Instagram, etc.)
- Enter username
- Click "Execute Now" or "Schedule"
Scrape Content
- Go to "Content Scraping"
- Select platform and content type
- Enter target (username, hashtag, etc.)
- Set maximum items to scrape
- Execute or schedule the task

API Usage

Scrape Social Media Profile

curl -X POST http://localhost:8000/social-media/execute-profile \
  -H "Content-Type: application/json" \
  -d '{
    "platform": "twitter",
    "username": "elonmusk"
  }'

Scrape Social Media Content

curl -X POST http://localhost:8000/social-media/execute-content \
  -H "Content-Type: application/json" \
  -d '{
    "platform": "twitter",
    "content_type": "tweets",
    "target": "elonmusk",
    "max_items": 20
  }'

Python Script Usage

Scrape Profile

cd python_scraper
python main.py --action scrape_profile --platform twitter --username elonmusk

Scrape Content

python main.py --action scrape_content --platform twitter --content_type tweets --target elonmusk --max_items 20

📊 Database Schema

Tables

scraping_tasks - Task metadata and status
scraping_results - Scraped data results
scraping_errors - Error logs
scraping_logs - Detailed operation logs

🔧 Configuration

Environment Variables

# Python Configuration
PYTHON_PATH=python3

# Database Configuration
DB_HOST=localhost
DB_PORT=3306
DB_DATABASE=laravel
DB_USERNAME=laravel
DB_PASSWORD=laravel

# Scraping Configuration
SCRAPING_ANTI_BOT=true
SCRAPING_USE_PROXIES=false
SCRAPING_MAX_ITEMS=100
SCRAPING_MAX_EXECUTION_TIME=300

# Platform-specific settings
SCRAPING_ALLOWED_PLATFORMS=twitter,instagram,facebook,linkedin,tiktok,youtube

Platform-Specific Settings

Each platform has specific rate limits and requirements:

Twitter: 300 requests per 15 minutes
Instagram: 200 requests per hour
Facebook: 200 requests per hour
LinkedIn: 100 requests per day
TikTok: 1000 requests per hour
YouTube: 10,000 requests per day

🛡️ Anti-Bot Measures

The application includes several anti-bot measures:

User Agent Rotation - Random user agents for each request
Proxy Support - Optional proxy rotation
Random Delays - Configurable delays between requests
Session Rotation - Automatic session rotation
Rate Limiting - Built-in rate limiting per platform

📝 Logging

The application provides comprehensive logging:

Console Logs - Real-time operation logs
File Logs - Persistent log files with rotation
Database Logs - Structured logging in database
Error Tracking - Detailed error logging and tracking

🚨 Rate Limiting & Ethics

Rate Limiting

Respects platform-specific rate limits
Configurable delays between requests
Automatic retry with exponential backoff

Ethical Considerations

Only scrape publicly available data
Respect robots.txt files
Implement reasonable delays
Monitor for rate limiting responses
Use the application responsibly

🔍 Monitoring & Analytics

Dashboard Features

Real-time scraping statistics
Platform-specific metrics
Success/failure rates
Recent tasks overview
Environment health checks

Export Options

JSON - Structured data export
CSV - Spreadsheet-friendly format
Excel - Advanced formatting options

🐛 Troubleshooting

Common Issues

Python Not Found

# Set correct Python path in .env
PYTHON_PATH=/usr/bin/python3

Dependencies Missing

cd python_scraper
pip install -r requirements.txt

Browser Drivers

# Install Playwright browsers
playwright install

# Install ChromeDriver
pip install webdriver-manager

Database Connection

# Check database configuration
php artisan config:cache
php artisan migrate:status

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

⚠️ Disclaimer

This application is for educational and research purposes only. Users are responsible for:

Complying with platform Terms of Service
Respecting rate limits and robots.txt
Using the application ethically and legally
Obtaining necessary permissions for data collection

🆘 Support

For support and questions:

Check the troubleshooting section
Review the logs for error details
Ensure all dependencies are installed
Verify environment configuration

Note: This application is designed for legitimate data collection and research purposes. Always respect platform policies and use responsibly.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.cursor		.cursor
app		app
bootstrap		bootstrap
config		config
database		database
docker		docker
public		public
python_scraper		python_scraper
resources		resources
routes		routes
storage		storage
tests		tests
.editorconfig		.editorconfig
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SETUP_GUIDE.md		SETUP_GUIDE.md
artisan		artisan
composer.json		composer.json
composer.lock		composer.lock
docker-compose.yml		docker-compose.yml
install.sh		install.sh
package-lock.json		package-lock.json
package.json		package.json
phpunit.xml		phpunit.xml
quick-test.sh		quick-test.sh
start.sh		start.sh
test-scraping.php		test-scraping.php
vite.config.js		vite.config.js

Folders and files

Latest commit

History

Repository files navigation

Social Media Scraping Application

🚀 Features

Supported Platforms

Core Features

🏗️ Architecture

📋 Requirements

System Requirements

Python Dependencies

🛠️ Installation

1. Clone the Repository

2. Install PHP Dependencies

3. Install Node.js Dependencies

4. Setup Environment

5. Configure Database

6. Setup Python Environment

7. Install Browser Drivers

8. Run Database Migrations

9. Start the Application

🎯 Usage

Web Interface

API Usage

Scrape Social Media Profile

Scrape Social Media Content

Python Script Usage

Scrape Profile

Scrape Content

📊 Database Schema

Tables

🔧 Configuration

Environment Variables

Platform-Specific Settings

🛡️ Anti-Bot Measures

📝 Logging

🚨 Rate Limiting & Ethics

Rate Limiting

Ethical Considerations

🔍 Monitoring & Analytics

Dashboard Features

Export Options

🐛 Troubleshooting

Common Issues

🤝 Contributing

📄 License

⚠️ Disclaimer

🆘 Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages