Rental listing optimizer - Chris Krasniak for the capstone project for The Data Incubator
A tool to get predictions on monthly income based on the listing for a property which can also be used to vary the inputs for a given property to see how they affect predicted income. This is done by predicting the occupancy rate for a given listing.
This tool would theoretically be hosted by a company such as AirBnB or VRBO, with the intended user being property owners that have listings on these sites. These companies could benefit from their hosts using this tool, because the company earns a percent of the rental income generated by the property. Therefore, if they help their hosts to increase their occupancy rate with this tool, the company will increase their revenue. The focus here is on occupancy rate as the price is relatively easy to set accurately, so there is little room for price optimization.
To predict the occupancy rate of listings, I first obtained a dataset with ~500k listings from 30 different cities in the US. These listings had been scraped from airbnb by insideairbnb for 3 financial quarters, and I used Selenium to download the 90 CSVs and compiled them. After some data exploration, I made several custom sklearn transformers to perform data cleaning, some notable ones: remove listings with fewer than 14 days available or 0 days booked over the next year. The logic here was that because this data did not contain the true occupancy rate (the data scraped from the website could be unavailable because the host did not make them available, ie it could not have been rented, or because it was actually rented) I assumed that listings that were rarely available even across a year were likely not listings that were always made available to renters. Parsed some string data that should have been represented as numeric eg. $100.00 -> 100 and 1 and a half baths -> 1.5 Used the latitude/longitude of the listing along with the neighborhood to determine the distance of the location to the center of the neighborhood. Used tfidf vectorization on description text, property name text, and amenities text and used as features. After the various data cleaning and feature engineering, I used XGBoost regression to generate predictions for the availability for a listing across the next 90 days. The final model achieved an r-squared value of about 0.35. I do have Ideas to further improve the performance, the main one being training a separate model for each city instead of relying on the model to tell the difference between the different markets. After I had a model, I created a streamlit app that would allow a user to input the data for their listing and create a prediction for a given listing. The web app then allowed you to change various fields in the listing, and kept track of the values of each input as well as the predicted monthly income. With a download button, the user could then retrieve the data from the best listing, and use that for the actual listing they actually post to airbnb or VRBO.
To run the app:
- Clone the repo
- Install the requirements, you can do this by navigating to the cloned repo then executing
conda env create -f environment.yml, then activating this environment withconda activate ds_default(suggested) or trying to use pip to install the required packages in requirements.txt - In a terminal with
ds_defaultenvironment activated, run the commandstreamlit run app.py