A Python toolkit for downloading edX course videos as MP4 files with their real names.
- Capture video URLs - Automatically navigates through course pages and captures HLS stream URLs
- Download as MP4 - Parallel downloads with real video/module names
- Session management - Save browser login state for authenticated access
- Course scraping - Extract course content, structure, and text
- Python 3.10+
- Playwright
- yt-dlp
# Clone the repository
git clone https://github.com/YOUR_USERNAME/edx-scraper.git
cd edx-scraper
# Install dependencies
pip install -r requirements.txt
# Install Playwright browser
playwright install chromiumpython login.pyA browser will open. Log into edX, then press Enter to save your session.
python capture_videos.py --auto --out outputThis will:
- Open a browser and navigate through all course modules
- Capture video stream URLs with their real names
- Save to
output/captured_videos.json
python download_videos.py --urls output/captured_videos.json --out output/videos --workers 5Videos will be saved with real names like:
001_Module_1_Introduction.mp4
002_Week_2_Classification.mp4
- Double-click
RUN_CAPTURE.bat - Wait for capture to complete
- Double-click
RUN_DOWNLOAD.bat
edx-scraper/
├── login.py # Save browser session
├── capture_videos.py # Capture video URLs from course
├── download_videos.py # Download videos as MP4
├── scrape.py # Scrape course text content
├── edx_scraper/ # Core library
│ ├── __init__.py
│ └── scraper.py
├── RUN_CAPTURE.bat # Windows batch for capture
├── RUN_DOWNLOAD.bat # Windows batch for download
├── requirements.txt
└── README.md
--auto Auto-navigate through modules
--out Output directory (default: output)
--course-url Course URL to scrape
--max-pages Maximum pages to visit (default: 100)
--urls URL file (json or txt)
--out Output directory for videos
--workers Parallel downloads (default: 5)
- Your login session (
edx_storage_state.json) is gitignored - never commit this file - Video URLs expire after some time, so download soon after capturing
- Some courses may have DRM protection that prevents downloading
MIT License - For educational purposes only.