Skip to content

georgehong/216-final-proj

Repository files navigation

CS216 Final Project

Dataset

Yelp Dataset Source (Kaggle Link)

  • yelp_academic_dataset_business.json
  • yelp_academic_dataset_review.json
  • yelp_academic_dataset_user.json

Yelp Covid Response Dataset (Official)

  • yelp_academic_dataset_covid_features.json

Restaurant Business Rankings 2020

  • Top250.csv

Quickstart

Place the above .json files into a folder called data

The following script converts the files into .csv. Caution: May take a while to run

pip install -r requirements.txt
python3 load_data_toolkit.py

To directly convert from .json to a DataFrame:

from load_data_toolkit import load_json_to_dataframe

yelp_df = load_json_to_dataframe('data/yelp_academic_dataset_business.json')
print(yelp_df.shape)

Recommender System:

Based on a user’s previous Yelp ratings and restaurants they’ve visited, can we predict another restaurant from the dataset that they will enjoy using a machine learning model? What factors are most important in identifying a “match” between a user’s previously high-rated restaurants and a new restaurant for them to try?

Filtering Data

semantic_filter.ipynb allows the data scientist to choose a text query and similarity threshold to return only the businesses whose categories (classification) provides a sufficient semantic match.

Interaction Matrix

collaborative_rec.ipynb (Deprecated due to slow performance)

KNN Semantic Similarity Match

business_similarity_rec.ipynb uses Google's Universal Sentence Encoder to compute vectors from the meaning of each business, and uses cosine similarity to determine the best matches.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors