This dataset includes quantitative and categorical features from online reviews from 21 hotels located in Las Vegas Strip, extracted from TripAdvisor. All the 504 reviews were collected between January and August of 2015.
The dataset contains 504 records and 20 tuned features, 24 per hotel (two per each month, randomly selected), regarding the year of 2015. The CSV contains a header, with the names of the columns corresponding to the features.
-
Software and libraries
- This project has been done entirely in python
- The code is present in a python notebook format so a python notebook kernel like
Jupyter
will come handy - Most common libraries numpy, pandas, sklearn and matplotlib
- Please go through the
code
to understand the software dependencies
-
Data
- Data folder contains the raw files downloaded from UCI as well as clean daat csv named
ratings.csv
- Data folder contains the raw files downloaded from UCI as well as clean daat csv named
-
Code
- Code for analysis can be run using the python notebook, just open and
run all
in your Jupyter Kernel
- Code for analysis can be run using the python notebook, just open and
-
Results
- Results of EDA and model scores have been stored and embedded into the project report
EDA.md
EDA.md
is the overall analysis report of this project
- I will be experimenting will more number of machine learning algorithms to improve score accuracy on this data in future
- For now checkout the latest release
Lichman, M. (2013). UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science