Yelp Dataset Source (Kaggle Link)
yelp_academic_dataset_business.jsonyelp_academic_dataset_review.jsonyelp_academic_dataset_user.json
Yelp Covid Response Dataset (Official)
yelp_academic_dataset_covid_features.json
Restaurant Business Rankings 2020
Top250.csv
Place the above .json files into a folder called data
The following script converts the files into .csv. Caution: May take a while to run
pip install -r requirements.txt
python3 load_data_toolkit.py
To directly convert from .json to a DataFrame:
from load_data_toolkit import load_json_to_dataframe
yelp_df = load_json_to_dataframe('data/yelp_academic_dataset_business.json')
print(yelp_df.shape)Based on a user’s previous Yelp ratings and restaurants they’ve visited, can we predict another restaurant from the dataset that they will enjoy using a machine learning model? What factors are most important in identifying a “match” between a user’s previously high-rated restaurants and a new restaurant for them to try?
semantic_filter.ipynb allows the data scientist to choose a text query and similarity threshold to return only the
businesses whose categories (classification) provides a sufficient semantic match.
collaborative_rec.ipynb (Deprecated due to slow performance)
business_similarity_rec.ipynb uses Google's Universal Sentence Encoder to compute vectors from the meaning of each
business, and uses cosine similarity to determine the best matches.