airbnb_customer_sentiment

Description

Analysis of text data by extracting the main topics from airbnb dataset using Latent Dirichlet Allocation (LDA) and then Linear Regression to interpret the topics.

Business Need

Due to the subjective nature of tourist satisfaction in the hospitality industry, even the most influential hosts may struggle to identify the predecessors of guest contentment and dissatisfaction. Because AirBnB's business model is currently decentralized, determining which features are crucial to the customer experience is significantly more difficult. Given that, despite the wealth of data provided by the website Inside Airbnb, there has been little research into the importance of Airbnb hosts, and they would benefit immensely from customer comments to better understand the lodging sector.

In contrast to hotels and hostels, where consumer brand awareness is critical, AirBnB users are willing to pay a premium for a listing based mostly on user evaluations, which are often the most accurate source of information. Therefore, we want to create some guidelines for hosts to improve their customers' experience. To accomplish this, we intend to use the information contained in the listings' evaluations to determine if there is a link between the occurrence of certain topics in listing reviews and an increase in its price or star reviews.

The objective is to understand what listing characteristics boost customers’ experience and pleasure, inspecting general guest feedback, AirBnB listing location qualities, what tangible and intangible features of the listings consumers highlight, the level of service provided by the host, and what customers want or anticipate, as well as the overall management of the listing. Once those topics are identified it will be vital to understand the effect they have on the listing’s price/star rating.

AI Techniques

1. Latent Dirichlet Allocation (LDA)

Natural Language Processing for text mining will be employed as the AI approach, with each listing having a document including a matrix of phrases used by consumers while writing their evaluations. Only English words and non-stop words will be examined, and the subjects intrinsic to the collection of descriptive text, in our instance, reviews, will be categorized according to the learned topics to be used later for machine learning models, using Latent Dirichlet Allocation (LDA).

Advantages: We utilize LDA because it assigns a probability distribution over topics to each listing, where topics are probability distributions over words. It is a more human-interpretable approach to topic modeling and is an efficient way to analyze enormous quantities of text. Since interpretability is important, the reason why we chose LDA is that compared to other topic modeling algorithms, it offers word distributions. This helps in identifying if a sensible number of topics have been chosen and provides insightful labels to them.

Disadvantages: Natural Language Processing can be hard and there is no correct answer as text data presents unique challenges stemming from the qualitative and unstructured nature of the data. The selection of the number of topics to the model will bring about completely different results. Also, with NLP, importance must be put on the distinction of topics to form different segments.

2. Linear Regression

Description of the data to be utilized.

All the data cited are referred to the listings in London.

Detailed Review Data for listings includes listing id, the review that the listing refers to, the date of the review, and the text in the review. The analysis may not completely cover the whole timespan present in the data frame but be instead limited to the last disposable year, as customer preferences often are subject to big changes over time and outdated data may lead to wrong insights. The data will then be used to train the LDA.
Listings’ Data, including but not limited to price, location, number of beds, bathrooms, and score ratings, to be used to train regression models and understand if the presence of specific topics in a listing’s reviews has an effect on the price/rating score of that listing itself.

Motivation

By using the topic modeling technique, we aim to uncover hidden structures in user reviews and derive useful insights for hosts to improve their listing’s customer experience.

Libraries

The required libraries to execute all the scripts successfully are the following:

pandas: data analysis and manipulation tool
numpy: library for working with arrays
os: to interact with the underlying operating system
nltk: various text processing libraries with a lot of test datasets
re: special text string used for describing a search pattern
spacy: for NLP such as tokenization, named entity recognition with pre-trained models for several languages
wordcloud: a visual representation of text data
pprint: prints Python data structures in a form which can be used as input to the interpreter
pyLDAvis: for interactive topic model visualization
gensim: for representing documents as semantic vectors
pickle: serializing and deserializing a Python object structure
scipy.stats: probability distributions, summary and frequency statistics, correlation functions and statistical tests
sklearn.linear_model.LinearRegression: linear model with coefficients
sklearn.compose.ColumnTransformer: applies transformers to columns of an array or pandas DataFrame
sklearn.impute.SimpleImputer: univariate imputer for completing missing values with simple strategies
sklearn.pipeline.Pipeline: to assemble several steps that can be cross-validated together while setting different parameters
sklearn.metrics.classification_report: to build a text report showing the main classification metrics

Data

Data for reviews.csv and listings.csv were both downloaded using: http://insideairbnb.com/.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
lda_technique.ipynb		lda_technique.ipynb
regression_analysis.ipynb		regression_analysis.ipynb
topic_frequency.ipynb		topic_frequency.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

airbnb_customer_sentiment

Description

Business Need

AI Techniques

Motivation

Libraries

Data

About

Releases

Packages

Languages

cauchi94/airbnb-customer-sentiment

Folders and files

Latest commit

History

Repository files navigation

airbnb_customer_sentiment

Description

Business Need

AI Techniques

Motivation

Libraries

Data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages