[Data] Explore the Hotel Review Data
In this post, I am going to explore a Hotel Review dataset from Kaggle using pandas and visualize data using matplotlib.
The Data
This dataset “515K Hotel Reviews Data in Europe” can be downloaded from Kaggle. It contains 515,000 customer reviews and scoring of 1493 luxury hotels across Europe.
The csv file contains 17 fields, including the hotel’s information, positive/negative review content, reviewer score, etc. The description of each field can be found on Kaggle.
|
|
About the Hotels
How many hotels were rated?
Use value_counts()
to get the counts of unique values of a Series.
|
|
1,492 hotels were reviewd in this dataset.
Britannia International Hotel Canary Wharf 4789 Strand Palace Hotel 4256 Park Plaza Westminster Bridge London 4169 Copthorne Tara Hotel London Kensington 3578 DoubleTree by Hilton Hotel London Tower of London 3212 ... Le Lavoisier 12 Hotel Eitlj rg 12 Hotel Wagner 10 Mercure Paris Porte d Orleans 10 Hotel Gallitzinberg 8 Name: Hotel_Name, Length: 1492, dtype: int64
About the Reviewers
Where are the reviewers from?
|
|
The reviews were written by people from 227 different countries. Nearly half of the reviewers are from the United Kingdom, others are from the USA, Australia, Ireland, UAE, etc.
United Kingdom 0.475524 United States of America 0.068711 Australia 0.042048 Ireland 0.028749 United Arab Emirates 0.019845 ... Tuvalu 0.000002 Anguilla 0.000002 Vatican City 0.000002 Svalbard Jan Mayen 0.000002 Comoros 0.000002 Name: Reviewer_Nationality, Length: 227, dtype: float64
|
|
About the Review
When was the review been published?
|
|
The reviews were collected in two years from 2015-08-04 to 2017-08-03.
Review Content
In average, the length (word count) of negative reviews (18.5) is longer than that of positive reviews (17.8).
|
|
Let’s look at how the review scores are distributed.
|
|
Reviews with Score >= 5.0
For those review score >= 5.0, we call it an overall positive review; otherwise, an overall negative review.
|
|
The number of reviews with Reviewer_Score
>= 5.0 is 493,457, it means that more than 95% of reviews are overall positive, and the average score is 8.6.
|
|
For overall positive reviews (Reviewer_Score
>= 5.0), the average word count of positive words is 18.23.
For overall positive reviews (Reviewer_Score
>= 5.0), the average word count of negative words is 17.22.
Reviews with Score < 5.0
|
|
The number of reviews with Reviewer_Score
< 5.0 is 22,281, it means that only 4% of reviews are overall negative, and the average score is 3.86.
|
|
For overall negative reviews (‘Reviewer_Score’ < 5.0), the average word count of positive words is 7.68.
For overall negative reviews (‘Reviewer_Score’ < 5.0), the average word count of negative words is 47.81. Reviewers tend to give more details in negative side of a hotel.
Tags
|
|
The most popular tag is “Leisure trip”.
[('Leisure trip', 417778), ('Submitted from a mobile device', 307640), ('Couple', 252294), ('Stayed 1 night', 193645), ('Stayed 2 nights', 133937), ('Solo traveler', 108545), ('Stayed 3 nights', 95821), ('Business trip', 82939), ('Group', 65392), ('Family with young children', 61015)]