# Split the data into 50%, 50% (50% for training the clustering model and the model was used to predict for the other 50%)
The other 50% is now used as a whole dataset for the classification model i.e we'll now take it as 100%
#This 100% is now split into 80% and 20% (80% for training the classification model and 20% for testing the classification model)