edX Video Scraper & Downloader

A Python toolkit for downloading edX course videos as MP4 files with their real names.

Features

Capture video URLs - Automatically navigates through course pages and captures HLS stream URLs
Download as MP4 - Parallel downloads with real video/module names
Session management - Save browser login state for authenticated access
Course scraping - Extract course content, structure, and text

Requirements

Python 3.10+
Playwright
yt-dlp

Installation

# Clone the repository
git clone https://github.com/YOUR_USERNAME/edx-scraper.git
cd edx-scraper

# Install dependencies
pip install -r requirements.txt

# Install Playwright browser
playwright install chromium

Quick Start

Step 1: Login & Save Session

python login.py

A browser will open. Log into edX, then press Enter to save your session.

Step 2: Capture Video URLs

python capture_videos.py --auto --out output

This will:

Open a browser and navigate through all course modules
Capture video stream URLs with their real names
Save to output/captured_videos.json

Step 3: Download Videos

python download_videos.py --urls output/captured_videos.json --out output/videos --workers 5

Videos will be saved with real names like:

001_Module_1_Introduction.mp4
002_Week_2_Classification.mp4

Alternative: Use Batch Files (Windows)

Double-click RUN_CAPTURE.bat
Wait for capture to complete
Double-click RUN_DOWNLOAD.bat

Project Structure

edx-scraper/
├── login.py              # Save browser session
├── capture_videos.py     # Capture video URLs from course
├── download_videos.py    # Download videos as MP4
├── scrape.py             # Scrape course text content
├── edx_scraper/          # Core library
│   ├── __init__.py
│   └── scraper.py
├── RUN_CAPTURE.bat       # Windows batch for capture
├── RUN_DOWNLOAD.bat      # Windows batch for download
├── requirements.txt
└── README.md

Command Options

capture_videos.py

--auto          Auto-navigate through modules
--out           Output directory (default: output)
--course-url    Course URL to scrape
--max-pages     Maximum pages to visit (default: 100)

download_videos.py

--urls          URL file (json or txt)
--out           Output directory for videos
--workers       Parallel downloads (default: 5)

Notes

Your login session (edx_storage_state.json) is gitignored - never commit this file
Video URLs expire after some time, so download soon after capturing
Some courses may have DRM protection that prevents downloading

License

MIT License - For educational purposes only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

edX Video Scraper & Downloader

Features

Requirements

Installation

Quick Start

Step 1: Login & Save Session

Step 2: Capture Video URLs

Step 3: Download Videos

Alternative: Use Batch Files (Windows)

Project Structure

Command Options

capture_videos.py

download_videos.py

Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
edx_scraper		edx_scraper
.gitignore		.gitignore
README.md		README.md
RUN_CAPTURE.bat		RUN_CAPTURE.bat
RUN_DOWNLOAD.bat		RUN_DOWNLOAD.bat
capture_videos.py		capture_videos.py
download_videos.py		download_videos.py
login.py		login.py
requirements.txt		requirements.txt
scrape.py		scrape.py

Folders and files

Latest commit

History

Repository files navigation

edX Video Scraper & Downloader

Features

Requirements

Installation

Quick Start

Step 1: Login & Save Session

Step 2: Capture Video URLs

Step 3: Download Videos

Alternative: Use Batch Files (Windows)

Project Structure

Command Options

capture_videos.py

download_videos.py

Notes

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages