Skip to content

Gathering the entire catalog #1

@rjstange

Description

@rjstange

How long do you think it would take to gather spotify's entire catalog? I'm trying to predict popularity based on audio features, and I am thinking that increasing the amount of training data I will be able to improve my test scores, as my ExtraTreesClassifier model is currently getting a 66% weighted average f1-score with 222k of the songs used in the training set (using train_test_split of course), and no surprise, it is getting an f1-score of 83% for the fourth class (songs with 75% percentile and above in popularity). Would be nice to have more examples of less popular songs, even though the popularity distribution is pretty good right now, basically gaussian.

Just wondering if this is practical.

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions