Netflix Content Analytics Platform

A data analytics case study using Python, Pandas, Matplotlib, and Seaborn to analyze Netflix Movies and TV Shows and uncover trends in content growth, genre dominance, global production, ratings, and creator activity.

Project Overview

This project analyzes the Netflix Movies and TV Shows dataset to understand how the platform's catalog has evolved over time. The analysis explores content growth, Movies vs TV Shows distribution, genre popularity, country contribution, ratings, and director activity.

The goal is to transform raw entertainment metadata into clear insights that communicate Netflix's catalog strategy and global content patterns.

Problem Statement

Streaming platforms generate large content catalogs, but raw title metadata does not immediately explain platform strategy. This project answers:

How has Netflix expanded its content library over time?
Are Movies or TV Shows more dominant in the catalog?
Which genres appear most frequently?
Which countries contribute the most Netflix content?
Which directors have the highest number of titles?
What does the ratings distribution suggest about audience targeting?

Key Takeaways

Netflix content growth accelerated sharply after 2015, with the strongest expansion between 2016 and 2019.
Movies make up a significantly larger share of the catalog than TV Shows.
International Movies, Dramas, and Comedies are among the most represented genres.
The United States contributes the highest number of titles, followed by India and the United Kingdom.
Several directors appear multiple times, suggesting recurring creator partnerships.
Ratings trends provide insight into how Netflix balances mainstream, family, teen, and mature content.

Business Impact

This analysis translates Netflix catalog metadata into decision-ready insights that can support:

Content Strategy: Identifying when Netflix expanded most aggressively and which content types dominate the catalog.
Market Expansion: Highlighting top contributing countries and where global production is most concentrated.
Audience Targeting: Using ratings and genre patterns to understand how the catalog is positioned for viewer segments.

Tools & Skills

Python
Pandas
Matplotlib
Seaborn
Jupyter Notebook
Data Cleaning
Exploratory Data Analysis
Data Visualization
Insight Communication

Dataset

Dataset: Netflix Movies & TV Shows Dataset
Source: Kaggle - Netflix Shows Dataset
Approximate Size: 8,800+ titles

Key fields used in the analysis:

show_id
type
title
director
cast
country
date_added
release_year
rating
duration
listed_in

Data Preparation & Methods

The dataset was cleaned and processed using Pandas before analysis. Main steps included:

Removed or handled missing and inconsistent values
Converted date fields for time-based analysis
Aggregated titles by year to analyze growth trends
Grouped genres and countries to identify dominant categories
Counted Movies vs TV Shows distribution
Analyzed director frequency and rating distribution
Generated visualizations using Matplotlib and Seaborn

Visual Insights

Content Growth Over Time

Netflix's catalog expanded rapidly after 2015 and peaked around 2019, suggesting a major platform expansion period.

Movies vs TV Shows Distribution

Movies clearly dominate the Netflix catalog, showing that film content makes up the majority of available titles.

Top Genres on Netflix

International Movies, Dramas, and Comedies appear most frequently, highlighting Netflix's focus on broad and globally appealing categories.

Top Directors on Netflix

Several directors contribute multiple titles, showing recurring creator presence within the Netflix catalog.

Ratings Distribution

Ratings analysis helps explain how Netflix content is distributed across different audience groups and maturity levels.

Data Analytics Pipeline

Dataset -> Data Cleaning -> Exploratory Data Analysis -> Visualization -> Insights

Architecture

flowchart LR
    A["Netflix CSV Dataset"] --> B["Data Cleaning / Preprocessing"]
    B --> C["Exploratory Data Analysis"]
    C --> D["Matplotlib & Seaborn Visualizations"]
    D --> E["Business-Ready Insights"]

    E --> F["Content Growth"]
    E --> G["Genre Distribution"]
    E --> H["Country Contribution"]
    E --> I["Director Activity"]
    E --> J["Ratings Distribution"]

Project Website

This repository also includes a responsive project portfolio webpage:

index.html
style.css

Direct project links:

The webpage presents the project as a recruiter-friendly case study with overview, key findings, visual analysis, workflow, architecture, and run instructions.

Quick Start

Prerequisites

Python 3.10+
Jupyter Notebook

Install the required libraries:

pip install -r requirements.txt

If you prefer installing manually:

pip install pandas matplotlib seaborn notebook

Clone the Repository

git clone https://github.com/bindhusaahithi/Netflix-Analysis.git
cd Netflix-Analysis

Run the Notebook

jupyter notebook Notebook/Netflix_Analysis.ipynb

Run all cells to reproduce the analysis and visualizations.

Project Structure

Netflix-Analysis/
├── Data/
│   └── netflix_titles.csv
├── Notebook/
│   └── Netflix_Analysis.ipynb
├── visuals/
│   ├── content_growth.png
│   ├── movies_vs_tvshows.png
│   ├── top_countries.png
│   ├── top_directors.png
│   ├── top_genres.png
│   └── top_ratings.png
├── index.html
├── requirements.txt
├── style.css
└── README.md

Future Improvements

Build an interactive dashboard using Streamlit or Power BI
Add deeper analysis by release year and country-genre combinations
Perform text analysis on Netflix descriptions
Train a recommendation or classification model using content metadata
Compare Netflix content patterns with other streaming platforms

Final Conclusion

This analysis shows that Netflix's catalog expanded aggressively in the late 2010s, with Movies remaining the dominant content type. The dataset also reflects strong international breadth, with major contributions from the United States, India, and the United Kingdom. Overall, the project demonstrates how exploratory data analysis can reveal meaningful patterns in content strategy, audience targeting, and global production trends.

Author

Bindhu Saahithi
Master's in Data Science

GitHub: bindhusaahithi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Netflix Content Analytics Platform

Project Overview

Problem Statement

Key Takeaways

Business Impact

Tools & Skills

Dataset

Data Preparation & Methods

Visual Insights

Content Growth Over Time

Movies vs TV Shows Distribution

Top Genres on Netflix

Top Countries Producing Netflix Content

Top Directors on Netflix

Ratings Distribution

Data Analytics Pipeline

Architecture

Project Website

Quick Start

Prerequisites

Clone the Repository

Run the Notebook

Project Structure

Future Improvements

Final Conclusion

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
Data		Data
Notebook		Notebook
visuals		visuals
.gitignore		.gitignore
README.md		README.md
index.html		index.html
requirements.txt		requirements.txt
style.css		style.css

Folders and files

Latest commit

History

Repository files navigation

Netflix Content Analytics Platform

Project Overview

Problem Statement

Key Takeaways

Business Impact

Tools & Skills

Dataset

Data Preparation & Methods

Visual Insights

Content Growth Over Time

Movies vs TV Shows Distribution

Top Genres on Netflix

Top Countries Producing Netflix Content

Top Directors on Netflix

Ratings Distribution

Data Analytics Pipeline

Architecture

Project Website

Quick Start

Prerequisites

Clone the Repository

Run the Notebook

Project Structure

Future Improvements

Final Conclusion

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages