The goal of this task is to clean and prepare a raw dataset that may contain:
- Missing values
- Duplicate entries
- Inconsistent formats
so that the dataset is ready for analysis or modeling.
The Netflix dataset was chosen for this task.
- Filled
directorcolumn with "Unknown" where data was missing. - Filled
castcolumn with "Not Available" where data was missing. - Filled
countrycolumn with "Unknown" where data was missing.
- Checked for duplicate entries.
- Removed duplicate rows using Pandas
.drop_duplicates().
- Standardized country names (e.g., converted to title case like United States, India).
- Stripped extra spaces.
- Converted
date_addedcolumn to datetime format for consistency.
- Changed all column headers to lowercase with underscores instead of spaces.
Example:Date Added→date_added
- Shape: 200 rows × 12 columns
- All missing values handled.
- No duplicate rows remain.
- Data is consistent and ready for analysis.
Netflix_Cleaned.csv→ Cleaned dataset (ready for analysis)README.md→ This summary file