GitHub - connect-midhunr/email-campaign-effectiveness-prediction: Machine learning model to predict the mail that is ignored, read or acknowledged by the reader.

Link to deployed model: https://connect-midhunr-email-campaign-effectiveness-predict-app-e58c8b.streamlit.app/

In this project, I have attempted to analyze the data on e-mail marketing campaign and build a machine learning model to predict the mail that is ignored, read, or acknowledged by the reader. No personal information of recipient is provided in this dataset.

💾 Project Files Description

This project contains an executable iPython Notebook, a presentation and source as follows:

Executable Files:

Email_Campaign_Effectiveness_Prediction_Capstone_Project.ipynb - Google Colab notebook containing data summary, exploration, visualisations, feature engineering, modelling, performance evaluation and conclusion.

Documentation:

Presentation PDF - Supervised Machine Learning - Classification - Email Campaign Effectiveness Prediction - Capstone Project.pdf - Presentation slideshow of the project.

Source Directory:

data_email_campaign.csv - Includes e-mail marketing campaign data.

📖 Problem Statement

Most of the small to medium business owners are making effective use of Gmail-based e-mail marketing strategies for offline targeting of converting their prospective customers into leads so that they stay with them in business. The main objective is to create a machine learning model to characterize the mail and track the mail that is ignored; read; acknowledged by the reader.

📖 Approach

Understanding the business task.
Import relevant libraries and define useful functions.
Reading data from files given.
Data pre-processing, which involves inspection of both datasets and data cleaning.
Exploratory data analysis, to find which factors affect sales and how they affect it.
Feature engineering, to prepare data for modelling.
Modelling data and comparing the models to find out most suitable one for classification.
Conclusion.

📖 Exploratory Data Analysis

The following insights were gained from EDA:

Less number of e-mails of campaign type 1 got ignored.

If campaign type is 1, then the mail has 66% chance of getting read and 23% change of getting acknowledged.

Customer location or time of day does not affect the status of e-mail.

As the number of previous communication increases, the chances of the e-mail being read or acknowledged also increases.

E-mails tend to get ignored when word count is greater than 800.

📖 Recommendations

Campaign 1 is more successful than other campaigns. So it is wise to continue Campaign 1 and discontinue others.

Increase customer interactions to get more results.

Limit word count to less than 800.

📖 Modelling

📖 Deployment

A web application is built to demonstrate the working of the trained machine learning model using Streamlit.

This web application is deployed with Streamlit Cloud Community, employing CI/CD pipeline.

Link to deployed model: https://connect-midhunr-email-campaign-effectiveness-predict-app-e58c8b.streamlit.app/

📊 Data Visualization

An interactive dashboard was also created with Tableau to display charts associated with the analysis.

Click here to interact with the data visualization.

📘: Conclusion

The following conclusions were drawn from Modelling:

Oversampled data seems to be better than undersampled data. This can be due to the fact that undersampling causes loss of information.

The model built using XGBoost algorithm with SMOTE dataset performed better than the other models. It should be preferred for predicting mail statuses.

If model interpretability is more important than accuracy, model built using logistic regression algorithm and SMOTE dataset should be chosen over the one using XGBoost algorithm. It is the best performer among the white box models.

📜 Credits

Midhun R | Avid Learner | Data Analyst | Data Scientist | Machine Learning Enthusiast

Contact me for Data Science Project Collaborations

📚 References

Towards Data Science, 'Having an Imbalanced Dataset? Here Is How You Can Fix It.'. [Online].

Available: https://towardsdatascience.com/having-an-imbalanced-dataset-here-is-how-you-can-solve-it-1640568947eb/
Machine Learning Mastery, 'Multinomial Logistic Regression With Python'. [Online].

Available: https://machinelearningmastery.com/multinomial-logistic-regression-with-python/
Kaggle, 'How to Choose Right Metric for Evaluating ML Model'. [Online].

Available: https://www.kaggle.com/code/vipulgandhi/how-to-choose-right-metric-for-evaluating-ml-model/

Image by storyset on Freepik

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
Images		Images
data		data
documents		documents
models		models
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💾 Project Files Description

Executable Files:

Documentation:

Source Directory:

📖 Problem Statement

📖 Approach

📖 Exploratory Data Analysis

📖 Recommendations

📖 Modelling

📖 Deployment

📊 Data Visualization

📘: Conclusion

📜 Credits

📚 References

About

Releases

Packages

Languages

License

connect-midhunr/email-campaign-effectiveness-prediction

Folders and files

Latest commit

History

Repository files navigation

💾 Project Files Description

Executable Files:

Documentation:

Source Directory:

📖 Problem Statement

📖 Approach

📖 Exploratory Data Analysis

📖 Recommendations

📖 Modelling

📖 Deployment

📊 Data Visualization

📘: Conclusion

📜 Credits

📚 References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages