GitHub - connect-midhunr/zomato-restaurant-clustering-and-sentiment-analysis: Build machine learning models to cluster the restaurants into different segments and analyze the sentiments of the reviews given by the customers.

Link to deployed model: http://sentimenent-analysis-zomato-review.ap-south-1.elasticbeanstalk.com/

In this project, I have attempted to analyze the metadata and reviews of popular restaurants in Hyderabad and build machine learning models to cluster the restaurants into different segments based on cuisines and analyze the sentiments of the reviews given by the customers.

💾 Project Files Description

This project contains an executable iPython Notebook, a presentation and source as follows:

Executable Files:

Zomato_Restaurant_Clustering_and_Sentiment_Analysis.ipynb - Google Colab notebook containing data summary, exploration, visualisations, feature engineering, text processing, modelling, performance evaluation and conclusion.

Documentation:

Presentation PDF - Unsupervised Machine Learning - Zomato Restaurant Clustering and Sentiment Analysis - Capstone Project.pdf - Presentation slideshow of the project.

Source Directory:

Data & Resources.zip - Includes metadata and review data of restaurants listed by Zomato in Hyderabad.

📖 Problem Statement

Zomato is an Indian restaurant aggregator and food delivery start-up founded by Deepinder Goyal and Pankaj Chaddah in 2008. Zomato provides information, menus, and user-reviews of restaurants, and also has food delivery options from partner restaurants in select cities. The main objective is to understand the existing data and analyze their trends and patterns, so that machine learning models can be built, one for segmentation of restaurant types and another for sentiment analysis of reviews.

📖 Approach

Understanding the business task.
Reading data from files given and providing a summary.
Data cleaning, which involves removing irregularities in the data.
Exploratory data analysis, to find which factors affect sales and how they affect it.
Feature engineering, to prepare data for modelling.
Text Processing, to convert text to numeric data for modelling.
Modelling data (for both clustering and sentiment analysis) and comparing the models to find out the most suitable one for forecasting.
Conclusion.

📖 Exploratory Data Analysis

The following insights were gained from EDA:

Collage - Hyatt Hyderabad Gachibowli is the most expensive restaurant and Mohammedia Shawarma, and Amul are the most affordable ones.

North Indian cuisine is the most popular cuisine.

Anvesh Chowdary is the most experienced reviewer while Satwinder Singh is the most popular one.

AB's - Absolute Barbecues is the highest rated restaurant.

Some linear relationship exists between the average rating of restaurants and the cost of food.

📖 Modelling

🖨️ Restaurant Clustering Based on Cuisines

💹 Clusters by K Means Algorithm

💹 Clusters by DBSCAN Algorithm

🖨️ Restaurant Clustering Based on Cost and Rating

💹 Clusters by K Means Algorithm

💹 Clusters by DBSCAN Algorithm

🖨️ Modelling for Sentiment Analysis

💹 Comparison of Models

💹 Performance of Model after Hyperparameter Tuning

📖 Deployment

A web application is built to demonstrate the working of the trained machine learning model using a combination of HTML, CSS, and JavaScript.

The prediction of sales using the trained ML model is carried out via a Flask API.

This web application is deployed with AWS Elastic Beanstalk, employing CI/CD pipeline.

Link to deployed model: http://sentimenent-analysis-zomato-review.ap-south-1.elasticbeanstalk.com/

📘 Conclusion

The following conclusions were drawn from Modelling:

Either of the two models, trained using K means algorithm or DBSCAN algorithm, can be chosen for clustering the restaurant dataset based on cuisines, depending on the number of clusters preferred and whether or not outliers be included.

The model built using K means algorithm is selected for clustering based on cost and ratings.

For sentiment analysis, the model built using random forest algorithm was chosen over others.

If model interpretability is more important than accuracy, model built using logistic regression should be chosen. Since the difference between accuracy of these two models is less than 7%, there won't be much difference in the result.

📜 Credits

Midhun R | Avid Learner | Data Analyst | Data Scientist | Machine Learning Enthusiast

Contact me for Data Science Project Collaborations

📚 References

Scikit Learn, 'sklearn.preprocessing.MultiLabelBinarizer'. [Online].

Available: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html/
Towards Data Science, 'A One-Stop Shop for Principal Component Analysis'. [Online].

Available: https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c/
Medium, 'Silhouette Analysis in K-means Clustering'. [Online].

Available: https://medium.com/@cmukesh8688/silhouette-analysis-in-k-means-clustering-cefa9a7ad111/

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.ebextensions		.ebextensions
Images		Images
data		data
docs		docs
models		models
notebooks		notebooks
static		static
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
application.py		application.py
requirements.txt		requirements.txt
textprocessor.py		textprocessor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💾 Project Files Description

Executable Files:

Documentation:

Source Directory:

📖 Problem Statement

📖 Approach

📖 Exploratory Data Analysis

📖 Modelling

🖨️ Restaurant Clustering Based on Cuisines

💹 Clusters by K Means Algorithm

💹 Clusters by DBSCAN Algorithm

🖨️ Restaurant Clustering Based on Cost and Rating

💹 Clusters by K Means Algorithm

💹 Clusters by DBSCAN Algorithm

🖨️ Modelling for Sentiment Analysis

💹 Comparison of Models

💹 Performance of Model after Hyperparameter Tuning

📖 Deployment

📘 Conclusion

📜 Credits

📚 References

About

Releases

Packages

Languages

License

connect-midhunr/zomato-restaurant-clustering-and-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

💾 Project Files Description

Executable Files:

Documentation:

Source Directory:

📖 Problem Statement

📖 Approach

📖 Exploratory Data Analysis

📖 Modelling

🖨️ Restaurant Clustering Based on Cuisines

💹 Clusters by K Means Algorithm

💹 Clusters by DBSCAN Algorithm

🖨️ Restaurant Clustering Based on Cost and Rating

💹 Clusters by K Means Algorithm

💹 Clusters by DBSCAN Algorithm

🖨️ Modelling for Sentiment Analysis

💹 Comparison of Models

💹 Performance of Model after Hyperparameter Tuning

📖 Deployment

📘 Conclusion

📜 Credits

📚 References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages