Skip to content

wtwilley17/customer-churn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Customer Churn Prediction for Internet Service using Databricks

This project aims to predict the churn rate of customers for an internet service provider using Databricks. Customer churn is a significant concern for businesses because it directly affects their revenue. Predicting churn can help the business to take proactive measures to retain customers and reduce churn.

Dataset

The dataset used for this project is a publicly available dataset from Kaggle. The dataset contains information about customers of an internet service provider, such as their subscription information, internet usage, and billing details. The dataset can be found here.

Project Workflow

The project workflow consists of the following steps:

  1. Data Cleaning and Preprocessing: The raw data needs to be cleaned and preprocessed before feeding it into the machine learning model. This step includes data transformation, handling missing data, and outliers analysis.
  2. Exploratory Data Analysis (EDA): EDA is performed to understand the distribution of data, identify patterns, and relationships between variables.
  3. Model Building: Machine learning models are built using various algorithms such as Logistic Regression, Decision Tree, Random Forest, and Gradient Boost.
  4. Model Evaluation: The models are evaluated based on various performance metrics such as accuracy and test error.
  5. Streaming: The best-performing model is deployed into streaming pipeline to predict the churn rate of new customers.

Tools and Technologies

This project is implemented using the following tools and technologies:

  • Databricks: Databricks is used as the primary platform for data processing, model building, and evaluation.
  • Python: Python is used as the programming language for data processing, model building, and evaluation.
  • Machine Learning Libraries: Scikit-learn, Pandas, and NumPy are used for machine learning model building and evaluation.

Conclusion

Customer churn prediction is a critical task for businesses, especially for the internet service industry. This project demonstrates how Databricks can be used to build and evaluate machine learning models to predict the churn rate of customers. The project can be extended to include more features and explore more advanced machine learning algorithms to improve the prediction accuracy.

About

Internet Provider Customer Churn

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages