Skip to content

Machine learning model to predict the mail that is ignored, read or acknowledged by the reader.

License

Notifications You must be signed in to change notification settings

connect-midhunr/email-campaign-effectiveness-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Banner Link to deployed model: https://connect-midhunr-email-campaign-effectiveness-predict-app-e58c8b.streamlit.app/

In this project, I have attempted to analyze the data on e-mail marketing campaign and build a machine learning model to predict the mail that is ignored, read, or acknowledged by the reader. No personal information of recipient is provided in this dataset.

💾 Project Files Description

This project contains an executable iPython Notebook, a presentation and source as follows:

Executable Files:

  • Email_Campaign_Effectiveness_Prediction_Capstone_Project.ipynb - Google Colab notebook containing data summary, exploration, visualisations, feature engineering, modelling, performance evaluation and conclusion.

Documentation:

  • Presentation PDF - Supervised Machine Learning - Classification - Email Campaign Effectiveness Prediction - Capstone Project.pdf - Presentation slideshow of the project.

Source Directory:

  • data_email_campaign.csv - Includes e-mail marketing campaign data.

📖 Problem Statement

Most of the small to medium business owners are making effective use of Gmail-based e-mail marketing strategies for offline targeting of converting their prospective customers into leads so that they stay with them in business. The main objective is to create a machine learning model to characterize the mail and track the mail that is ignored; read; acknowledged by the reader.

📖 Approach

  1. Understanding the business task.
  2. Import relevant libraries and define useful functions.
  3. Reading data from files given.
  4. Data pre-processing, which involves inspection of both datasets and data cleaning.
  5. Exploratory data analysis, to find which factors affect sales and how they affect it.
  6. Feature engineering, to prepare data for modelling.
  7. Modelling data and comparing the models to find out most suitable one for classification.
  8. Conclusion.

📖 Exploratory Data Analysis

The following insights were gained from EDA:

  • Less number of e-mails of campaign type 1 got ignored.
  • If campaign type is 1, then the mail has 66% chance of getting read and 23% change of getting acknowledged.
  • Customer location or time of day does not affect the status of e-mail.
  • As the number of previous communication increases, the chances of the e-mail being read or acknowledged also increases.
  • E-mails tend to get ignored when word count is greater than 800.
  • 📖 Recommendations

  • Campaign 1 is more successful than other campaigns. So it is wise to continue Campaign 1 and discontinue others.
  • Increase customer interactions to get more results.
  • Limit word count to less than 800.
  • 📖 Modelling

    Result

    📖 Deployment

  • A web application is built to demonstrate the working of the trained machine learning model using Streamlit.
  • This web application is deployed with Streamlit Cloud Community, employing CI/CD pipeline.
  • WebAppDesktop WebAppMobile

    Link to deployed model: https://connect-midhunr-email-campaign-effectiveness-predict-app-e58c8b.streamlit.app/

    📊 Data Visualization

    An interactive dashboard was also created with Tableau to display charts associated with the analysis.

    Banner

    Click here to interact with the data visualization.

    📘: Conclusion

    The following conclusions were drawn from Modelling:

  • Oversampled data seems to be better than undersampled data. This can be due to the fact that undersampling causes loss of information.
  • The model built using XGBoost algorithm with SMOTE dataset performed better than the other models. It should be preferred for predicting mail statuses.
  • If model interpretability is more important than accuracy, model built using logistic regression algorithm and SMOTE dataset should be chosen over the one using XGBoost algorithm. It is the best performer among the white box models.
  • 📜 Credits

    Midhun R | Avid Learner | Data Analyst | Data Scientist | Machine Learning Enthusiast

    Contact me for Data Science Project Collaborations

    LinkedIn Badge GitHub Badge Medium Badge Resume Badge

    📚 References

    Image by storyset on Freepik