How long do you think it would take to gather spotify's entire catalog? I'm trying to predict popularity based on audio features, and I am thinking that increasing the amount of training data I will be able to improve my test scores, as my ExtraTreesClassifier model is currently getting a 66% weighted average f1-score with 222k of the songs used in the training set (using train_test_split of course), and no surprise, it is getting an f1-score of 83% for the fourth class (songs with 75% percentile and above in popularity). Would be nice to have more examples of less popular songs, even though the popularity distribution is pretty good right now, basically gaussian.
Just wondering if this is practical.
Thanks.
How long do you think it would take to gather spotify's entire catalog? I'm trying to predict popularity based on audio features, and I am thinking that increasing the amount of training data I will be able to improve my test scores, as my ExtraTreesClassifier model is currently getting a 66% weighted average f1-score with 222k of the songs used in the training set (using train_test_split of course), and no surprise, it is getting an f1-score of 83% for the fourth class (songs with 75% percentile and above in popularity). Would be nice to have more examples of less popular songs, even though the popularity distribution is pretty good right now, basically gaussian.
Just wondering if this is practical.
Thanks.