drop unimportant columns and drop duplicates
- Reduce the number of specializations. By using the logic, put every specialization into a category called "other", if the specialization is below 10
- Identify the outliers by scatter plot
- Assign null values with their mean values
perform by,
- histogram
- scatterplot
- count plot
- pie chart
- heat map-represent the correlation between variables