A comprehensive Laravel-based web application for scraping social media platforms using Python as an external scraping engine. This application provides a robust, scalable, and user-friendly interface for extracting data from popular social media platforms.
- Twitter/X - User profiles, tweets, hashtags
- Instagram - User profiles, posts, hashtags
- Facebook - Page posts, group posts
- LinkedIn - User profiles, company posts
- TikTok - User profiles, videos, hashtags
- YouTube - Channel videos, search results, comments
- Multi-Platform Support - Scrape data from 6 major social media platforms
- Anti-Bot Measures - User agent rotation, proxy support, random delays
- Modular Architecture - Clean separation between Laravel backend and Python scraping engine
- Real-time Logging - Comprehensive logging system for monitoring scraping operations
- Database Storage - MySQL database for storing scraping results and metadata
- Export Options - Export results in JSON, CSV, or Excel formats
- Rate Limiting - Built-in rate limiting to respect platform policies
- Error Handling - Robust error handling and retry mechanisms
Laravel Backend (PHP)
βββ Controllers
β βββ SocialMediaController.php
β βββ WebScrapingController.php
βββ Services
β βββ SocialMediaScrapingService.php
β βββ PythonScrapingService.php
βββ Routes & Views
Python Scraping Engine
βββ scrapers/
β βββ social_media_scrapers.py
β βββ base_scraper.py
β βββ requests_scraper.py
β βββ selenium_scraper.py
β βββ playwright_scraper.py
β βββ scrapy_spider.py
βββ utils/
β βββ logger.py
β βββ anti_bot.py
βββ database/
β βββ mysql_handler.py
βββ main.py
- PHP 8.2+
- Python 3.8+
- MySQL 8.0+
- Composer
- Node.js & NPM
- requests
- beautifulsoup4
- selenium
- playwright
- scrapy
- pandas
- fake-useragent
- loguru
- pymysql
git clone <repository-url>
cd scrape-webcomposer installnpm installcp .env.example .env
php artisan key:generateUpdate your .env file with database credentials:
DB_CONNECTION=mysql
DB_HOST=127.0.0.1
DB_PORT=3306
DB_DATABASE=laravel
DB_USERNAME=laravel
DB_PASSWORD=laravel
PYTHON_PATH=python3cd python_scraper
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt# For Selenium
pip install webdriver-manager
# For Playwright
playwright installphp artisan migrate# Start Laravel development server
php artisan serve
# In another terminal, start the queue worker (optional)
php artisan queue:work-
Access the Application
- Navigate to
http://localhost:8000 - You'll be redirected to the social media scraping dashboard
- Navigate to
-
Scrape User Profiles
- Go to "Profile Scraping"
- Select platform (Twitter, Instagram, etc.)
- Enter username
- Click "Execute Now" or "Schedule"
-
Scrape Content
- Go to "Content Scraping"
- Select platform and content type
- Enter target (username, hashtag, etc.)
- Set maximum items to scrape
- Execute or schedule the task
curl -X POST http://localhost:8000/social-media/execute-profile \
-H "Content-Type: application/json" \
-d '{
"platform": "twitter",
"username": "elonmusk"
}'curl -X POST http://localhost:8000/social-media/execute-content \
-H "Content-Type: application/json" \
-d '{
"platform": "twitter",
"content_type": "tweets",
"target": "elonmusk",
"max_items": 20
}'cd python_scraper
python main.py --action scrape_profile --platform twitter --username elonmuskpython main.py --action scrape_content --platform twitter --content_type tweets --target elonmusk --max_items 20scraping_tasks- Task metadata and statusscraping_results- Scraped data resultsscraping_errors- Error logsscraping_logs- Detailed operation logs
# Python Configuration
PYTHON_PATH=python3
# Database Configuration
DB_HOST=localhost
DB_PORT=3306
DB_DATABASE=laravel
DB_USERNAME=laravel
DB_PASSWORD=laravel
# Scraping Configuration
SCRAPING_ANTI_BOT=true
SCRAPING_USE_PROXIES=false
SCRAPING_MAX_ITEMS=100
SCRAPING_MAX_EXECUTION_TIME=300
# Platform-specific settings
SCRAPING_ALLOWED_PLATFORMS=twitter,instagram,facebook,linkedin,tiktok,youtubeEach platform has specific rate limits and requirements:
- Twitter: 300 requests per 15 minutes
- Instagram: 200 requests per hour
- Facebook: 200 requests per hour
- LinkedIn: 100 requests per day
- TikTok: 1000 requests per hour
- YouTube: 10,000 requests per day
The application includes several anti-bot measures:
- User Agent Rotation - Random user agents for each request
- Proxy Support - Optional proxy rotation
- Random Delays - Configurable delays between requests
- Session Rotation - Automatic session rotation
- Rate Limiting - Built-in rate limiting per platform
The application provides comprehensive logging:
- Console Logs - Real-time operation logs
- File Logs - Persistent log files with rotation
- Database Logs - Structured logging in database
- Error Tracking - Detailed error logging and tracking
- Respects platform-specific rate limits
- Configurable delays between requests
- Automatic retry with exponential backoff
- Only scrape publicly available data
- Respect robots.txt files
- Implement reasonable delays
- Monitor for rate limiting responses
- Use the application responsibly
- Real-time scraping statistics
- Platform-specific metrics
- Success/failure rates
- Recent tasks overview
- Environment health checks
- JSON - Structured data export
- CSV - Spreadsheet-friendly format
- Excel - Advanced formatting options
-
Python Not Found
# Set correct Python path in .env PYTHON_PATH=/usr/bin/python3 -
Dependencies Missing
cd python_scraper pip install -r requirements.txt -
Browser Drivers
# Install Playwright browsers playwright install # Install ChromeDriver pip install webdriver-manager
-
Database Connection
# Check database configuration php artisan config:cache php artisan migrate:status
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
This application is for educational and research purposes only. Users are responsible for:
- Complying with platform Terms of Service
- Respecting rate limits and robots.txt
- Using the application ethically and legally
- Obtaining necessary permissions for data collection
For support and questions:
- Check the troubleshooting section
- Review the logs for error details
- Ensure all dependencies are installed
- Verify environment configuration
Note: This application is designed for legitimate data collection and research purposes. Always respect platform policies and use responsibly.