Skip to content

FelixClements/imdb-top-lists

Repository files navigation

🎬 IMDb Top Lists β†’ Sonarr & Radarr

Automated daily generation of popular IMDb TV shows (Sonarr) and movies (Radarr) as JSON import lists.

GitHub Actions License: MIT


✨ What This Does

This repository automatically generates and updates JSON lists of popular content from IMDb:

Content Type Output Files Use With
πŸ“Ί TV Shows top_tvshows_5.json, top_tvshows_10.json, top_tvshows_25.json Sonarr
πŸŽ₯ Movies top_movies_5.json, top_movies_10.json, top_movies_25.json Radarr

Key Features:

  • πŸ”„ Daily automation - GitHub Actions updates lists every day at 03:00 UTC
  • 🎬 Bollywood excluded - Movies list automatically filters Indian cinema
  • πŸ” TVDB resolution - TV shows include TVDB IDs for Sonarr compatibility
  • πŸš€ Playwright-powered - Bypasses IMDb's anti-bot protection
  • πŸ“¦ Zero maintenance - Works automatically once set up

πŸ“‹ Output Format

TV Shows (Sonarr)

[
  {"title": "One Piece", "tvdbId": 392276},
  {"title": "The Pitt", "tvdbId": 448176},
  {"title": "Invincible", "tvdbId": 368207}
]

Movies (Radarr)

[
  {"title": "Project Hail Mary", "imdbId": "tt12042730"},
  {"title": "The Super Mario Galaxy Movie", "imdbId": "tt28650488"}
]

πŸš€ Quick Start

Import to Sonarr

  1. Open Sonarr β†’ Settings β†’ Import Lists
  2. Click + β†’ Custom
  3. Set:
    • List Name: IMDb Top TV Shows
    • List URL: https://raw.githubusercontent.com/FelixClements/imdb-top-lists/main/top_tvshows_25.json
  4. Save and enable

Import to Radarr

  1. Open Radarr β†’ Settings β†’ Import Lists
  2. Click + β†’ Custom
  3. Set:
    • List Name: IMDb Top Movies
    • List URL: https://raw.githubusercontent.com/FelixClements/imdb-top-lists/main/top_movies_25.json
  4. Save and enable

πŸ› οΈ Local Usage

Prerequisites

  • Python 3.9+
  • pip

Installation

# Clone the repository
git clone https://github.com/FelixClements/imdb-top-lists.git
cd imdb-top-lists

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install Playwright browsers
playwright install chromium --with-deps

# Run the scrapers
python generate_list.py      # TV shows β†’ top_tvshows_25.json
python generate_movies.py   # Movies β†’ top_movies_25.json

Custom Options

# Generate top 50 TV shows
python generate_list.py -n 50 -o top_tvshows_50.json

# Generate top 100 movies
python generate_movies.py -n 100 -o top_movies_100.json

βš™οΈ Configuration

Parameter Default Description
-n, --number 25 Number of titles to fetch
-o, --output top_tvshows_25.json / top_movies_25.json Output filename
--user-agent Chrome 140 Custom User-Agent header

🎬 Bollywood Filtering

Movie list automatically excludes Indian cinema using IMDb URL parameters:

  • countries=!in - Excludes India as country of origin
  • languages=!hi - Excludes Hindi language films

This ensures a Western/international focus while respecting that Bollywood content has dedicated platforms.


πŸ”„ GitHub Actions Workflow

The workflow runs automatically every day at 03:00 UTC:

Jobs:
1. Checkout repository
2. Setup Python 3.11
3. Install dependencies
4. Install Playwright browsers
5. Generate TV show lists (5, 10, 25)
6. Generate movie lists (5, 10, 25)
7. Commit changes (if any)

Manual trigger: Go to Actions β†’ Update IMDb Lists β†’ Run workflow


πŸ“Š How It Works

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  GitHub Actions β”‚
β”‚  (Daily 03:00)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Playwright     β”‚ ◄── Bypasses WAF protection
β”‚  Chromium Browserβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  IMDb Scraper   β”‚ ◄── Scrapes popular TV/movies
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β”œβ”€β”€β–Ί TV Shows ──► TVMaze API ──► TVDB ID
         β”‚
         └──► Movies (Bollywood filtered)
         β”‚
         β–Ό
   JSON Files (committed to repo)
         β”‚
         β–Ό
   Sonarr / Radarr import

πŸ“ˆ Files Generated

File Content Source
top_tvshows_5.json Top 5 TV shows IMDb Popular TV
top_tvshows_10.json Top 10 TV shows IMDb Popular TV
top_tvshows_25.json Top 25 TV shows IMDb Popular TV
top_movies_5.json Top 5 movies IMDb Popular Movies
top_movies_10.json Top 10 movies IMDb Popular Movies
top_movies_25.json Top 25 movies IMDb Popular Movies

πŸ”§ Troubleshooting

Playwright Browser Issues

# Reinstall Playwright browsers
playwright install chromium --with-deps

No Items Scraped

The scraper includes retry logic (3 attempts) and increased timeouts for reliability. If issues persist:

  1. Check internet connection
  2. Verify IMDb URL is accessible
  3. Check GitHub Actions logs for details

🀝 Contributing

Contributions welcome! Feel free to:

  • Report bugs via Issues
  • Suggest features
  • Submit pull requests

πŸ“ License

MIT License - see LICENSE for details.


πŸ“œ Changelog

2026-04-06

  • Breaking: Renamed output files to top_tvshows_X.json and top_movies_X.json
  • Added movie scraper for Radarr with Bollywood exclusion
  • Fixed WAF blocking by switching to Playwright browser automation
  • Improved timeouts for GitHub Actions reliability
  • Added detailed logging for debugging

Historical

  • Initial release: TV show scraper for Sonarr

About

Automated daily generation of popular IMDb TV shows (for Sonarr) and movies (for Radarr). Playwright scraper with Bollywood exclusion, TVDB resolution, and GitHub Actions workflow.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages