Airbnb Data Analysis

Isak Kabir
5 min readMar 20, 2020

What drives Airbnb prices in Seattle?

Airbnb has disrupted the market of renting out homes. Guests can filter on types of homes, dates, location, price, number of guests, beds etc. Hosts and and guests can leave reviews about their experience. I am curious what drives Airbnb prices in Seattle and help all hosts to gain better understanding in optimal prices and attract more guests.

In this post, I will be doing some exploratory data analysis on the Seattle AirBnb Open Data on Kaggle. The post is covering three main questions
1. How price and rating relate to each other?
2. What’s the factor that drives higher prices ?
3. How can we predict the housing price?

Introduction

The dataset used covers 3 818 listings (houses) on Airbnb in Seattle from the year 2016. There is a large price range for rentals. Some apartment owners charge as much as $1000 a day, but the majority of homeowners charge between $75 up to $200.

How price and rating relate to each other?

Seattle listing homes have received an overall value of 94 out of 100 in review score rate. At least 50 % of the listing homes have received a review score of 96 and only 25 % of the homes have ratings less than 89. Either guests seems to appreciate the experience of booking houses in Seattle through Airbnb and/or they do respond with high ratings in general.

We can also conclude that high ratings are not solely related to higher price tagged homes, since many of the low price listings have also received high score. The ratings of low price houses are more distributed and sparse, but the guests can receive high value for their cost if they pay attention to review comments. Therefore, there is no necessary tradeoff of experience vs price.

What are the factors that drives higher prices ?

Amenities

I was curious of how amenities (such as TV, washer, dryer etc) impacts on higher prices. The following diagram shows the top amenities with the average price on the listings with (available) or without (missing) the amenity.

For example a listing with a kitchen has an average price of 140 dollars per day, while an apartment without kitchen has a rate of 100 dollar. The amenities that a host should consider upgrading their homes with are kitchen, TV, dryer, washer, fireplace, air condition, gym, elevator and hot tube. Hairdryer, shampoo and ‘laptop friendly workspace’ doesn’t impact the price too much and some amenities such as pets, dogs and cats have negative impact on the price. If more hosts did not allow pets, they might be able to earn more and increase the number of bookings.

Neighbourhood

Price is also distributed based on different neighbourhood. Southeast Magnolia has the highest average prices and Rainer Beach has the lowest renting prices. Boat house and condominium are the property types that have the highest prices and dorm is cheapest. Southeast Magnolia, Portage bay and Westlake are all high price neighbourhoods and close to bay, probably offering boathouse rentals.

Bed types and room types

Analysing bed types and room types, real beds and entire home/apartment have higher price than others. It’s interesting that apartments are double the price compared to private rooms.

Correlation heat map

The heat map describes that accommodations, bathrooms, bedrooms, beds, square feet, family-kid friendly and TV are the positive influencing factors of price. However, pets in the property, breakfast, number of reviews and reviews per month indicates negative correlations with price. As we already discussed above, it could be beneficial to not allow pets to live in the property in order to raise the price. It also seems that people prefers to manage their own breakfast.

Numbers of review is negatively correlated with price, has also been identified in previous graphs and this is probably due to that guest rarely book them or these guest don’t bother providing feedback.

How can we predict the housing price?

The last question covers how to predict the housing price in Seattle. The following steps has been done prior of building the house pricing model.

  1. Estimate percentage of missing values.
  2. Drop values which will not be used.
  3. Fill the rest of the missing data with mean value.
  4. Create new dummy columns for the categorical variables
  5. Use PCA to reduce the number of features.
  6. Split data into Train and Test, using sklearn’s train_test_split with Random_State = 42 and test size using 25%.
  7. Evaluate using R2 score and RMSE comparing the predicted values to that of the actual prices.

Trying out different predictive methods such as Linear regression, ElasticNet, Lasso, Random Forest, AdaBoost and Decision Tree, the best R2 score and RMSE was the Lasso model, which you can see above.

Conclusion

High ratings are not solely related to higher price of homes, since many of the low price listings have also received high score.

Factors that drives price and have positive influence are accommodations, neighbourhood, bathrooms, property type, bedrooms, beds, square feet, amenities and TV. Pets, breakfast, number of reviews and reviews per month indicates negative correlations with price. The amenities that hosts should consider upgrading their homes with are kitchen, TV, dryer, washer, fireplace, air condition, gym, elevator and hot tube.

Based on the Seattle data from Airbnb, the preferred model to predict housing price was based on the Lasso Regression Model.

For a detailed analysis of this work, please check out my Github repository

--

--