churn_prediction_for_Sparkify

DSND Term 2 Capstone Project: Predicting churn for Sparkify (a music streaming service)

Installation

The code should run with no issues using Python versions 3.* and PySpark 2.4.5. If you use Anaconda, please make sure install PySpark here.

You will also need to have software installed to run and execute an iPython Notebook.

Project Motivation

Sparkify is a fictional digital music service, created by Udacity to simulate real-world companies such as Spotify or Pandora. On Sparkify, users can play songs with free plan or premium subscription plan, which offers advanced functionalities and is ad-free. Users can upgrade, downgrade, or cancel their services at any time, and therefore, it is very important to keep users happy and not drop the service.

The purpose of this project is to analyze user activity logs and build a classifier to identify users who are likely to churn — canceled Sparkify music streaming service. Furthermore, considering reality overwhelming streaming data, I only use a small set of data (98MB) of full Sparkify data (12GB) for data exploration and model development. The final model is built by using Spark so it is scalable to run on a distributed clustering environment.

In this project, I will build and evaluate below models and pick the best performance of them based on F1 score:

Random Forest
Gradient Boosted Trees
Support Vector Machine

File Descriptions

Sparkify.ipynb - EDA and Churn prediction model
mini_sparkify_event_data.json - User activities logs

Results

Performance ranking by F1:

Random Forest: F1 score 0.746, Accuracy 0.781
Gradient Boosted Trees: F1 score 0.731, Accuracy 0.734
Support Vector Machine: F1 score 0.685, Accuracy 0.781
Baseline Model: F1 score 0.685, Accuracy 0.781

The analysis and insight summary can be found at the Medium post here.

Acknowledgements

Credit to Udacity for the data.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
.gitignore		.gitignore
README.md		README.md
Sparkify.ipynb		Sparkify.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

churn_prediction_for_Sparkify

Table of Contents

Installation

Project Motivation

File Descriptions

Results

Acknowledgements

About

Releases

Packages

Languages

andypwyu/churn_prediction_for_Sparkify

Folders and files

Latest commit

History

Repository files navigation

churn_prediction_for_Sparkify

Table of Contents

Installation

Project Motivation

File Descriptions

Results

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages