Part 4: Churn Prediction

In the last three sections of Data Driven Growth series, we have discovered tracking essential metrics, customer segmentation, and predicting the lifetime value programmatically. Since we know our best customers by segmentation and lifetime value prediction, we should also work hard on retaining them. That’s what makes Retention Rate is one of the most critical metrics.

Retention Rate is an indication of how good is your product market fit (PMF). If your PMF is not satisfactory, you should see your customers churning very soon. One of the powerful tools to improve Retention Rate (hence the PMF) is Churn Prediction. By using this technique, you can easily find out who is likely to churn in the given period. In this article, we will use a Telco dataset and go over the following steps to develop a Churn Prediction model:

Exploratory data analysis
Feature engineering
Investigating how the features affect Retention by using Logistic Regression
Building a classification model with XGBoost

Exploratory Data Analysis

We start with checking out how our data looks like and visualize how it interacts with our label (churned or not?). Let’s start with importing our data and print the first ten rows:

df_data = pd.read_csv('churn_data.csv')
df_data.head(10)

Output:

A better way to see all the columns and their data type is using .info() method:

It seems like our data fall under two categories:

Categorical features: gender, streaming tv, payment method &, etc.
Numerical features: tenure, monthly charges, total charges

Now starting from the categorical ones, we shed light on all features and see how helpful they are to identify if a customer is going to churn.

As a side note, in the dataset we have, Churn column is string with Yes/No values. We convert it to integer to make it easier to use in our analysis.

PUT /_ingest/pipeline/churn_labeling
{
  "description": "",
  "processors": [
    {
      "script": {
        "lang": "painless",
        "source": "ctx.churn_flag = ctx.Churn==\"Yes\" ? 1 : 0"
      }
    }
  ]
}

POST es-churn/_update_by_query?pipeline=churn_labeling
{
  "query": {
    "match_all": {}
  }
}

Gender

By using vertical bar visualization, we easily visualize how Churn Rate (1-Retention Rate) looks like for each value:

Churn Rate by Gender

Gender breakdown for the churn rate:

Female customers are more likely to churn vs. male customers, but the difference is minimal (~0.8%).

In the same way we do for the following charts:

Now we go over the features which show the most significant difference across their values:

Internet Service

Churn Rate by Internet Service

This chart reveals customers who have Fiber optic as Internet Service are more likely to churn. I normally expect Fiber optic customers to churn less due to they use a more premium service. But this can happen due to high prices, competition, customer service, and many other reasons.

Contract

Churn Rate by Contract

As expected, the shorter contract means higher churn rate.

Tech Support

Churn Rate by Tech Support

Customers don’t use Tech Support are more like to churn (~25% difference).

Payment Method

Automating the payment makes the customer more likely to retain in your platform (~30% difference).

Others

Let’s show some of the other features’ graphs here for the reference:

Churn Rate by Paperless Billing, Streaming Movies, Device Protection & Phone Service

We are done with the categorical features. Let’s see how numerical features look like:

Tenure

To see the trend between Tenure and average Churn Rate, let’s make a scatter plot:

Churn Rate by Tenure

Super apparent that the higher tenure means lower Churn Rate. We are going to apply the same for Monthly and Total Charges:

Output:

Churn Rate by Monthly & Total Charges

Unfortunately, there is no trend between Churn Rate and Monthly & Total Charges.

Feature Engineering

In this section, we are going to transform our raw features to extract more information from them. Our strategy is as follows:

1- Group the numerical columns by using clustering techniques

2- Encoding categorical features which are binary with label encoder

3- Categorizing features which have multiple values

Numerical Columns

As we know from the EDA section, We have three numerical columns:

Tenure
Monthly Charges
Total Charges

We are going to apply the following steps to create groups:

Using Elbow Method to identify the appropriate number of clusters
Applying K-means logic to the selected column and change the naming
Observe the profile of clusters

Let’s check how this works for Tenure in practice:

Cluster profiles:

We have 3 clusters with 7.5, 33.9 and 63 as their average Tenure.

Churn Rate for each cluster:

Churn Rate by tenure clusters

This is how it looks after applying the same for Monthly & Total Charges:

Monthly Charge:

Monthly Charge Clusters profile

Churn Rate by monthly charge clusters

Total Charge:

Total Charge Clusters profile

Churn Rate by total charge clusters

Categorical Columns

Label Encoder converts categorical columns to numerical by simply assigning integers to distinct values. For instance, the column gender has two values: Female & Male. Label encoder will convert it to 1 and 0.

get_dummies() method creates new columns out of categorical ones by assigning 0 & 1s. for example, for TenureCluster we will have 3 columns that takes two values 0 or 1 as follows :

TenureCluster_Low
TenureCluster_Mid
TenureCluster_High

Let’s see both in practice:

#import Label Encoder  
from sklearn.preprocessing import LabelEncoder  
le = LabelEncoder()  
dummy_columns = [] #array for multiple value columnsfor column in df_data.columns:  
    if df_data[column].dtype == object and column != 'customerID':  
        if df_data[column].nunique() == 2:  
            #apply Label Encoder for binary ones  
            df_data[column] = le.fit_transform(df_data[column])   
        else:  
            dummy_columns.append(column)#apply get dummies for selected columns  
df_data = pd.get_dummies(data = df_data,columns = dummy_columns)

Check out how the data looks like for the selected columns:

As you can see easily, gender & Partner columns became numerical ones, and we have three new columns for TenureCluster.

It is time to fit a logistic regression model and extract insights to make better business decisions.

Logistic Regression

Predicting churn is a binary classification problem. Customers either churn or retain in a given period. Along with being a robust model, Logistic Regression provides interpretable outcomes too. As we did before, let’s sort out our steps to follow for building a Logistic Regression model:

Prepare the data (inputs for the model)
Fit the model and see the model summary

And the summary looks like below:

![Image for post]()

We have two important outcomes from this report. When you prepare a Churn Prediction model, you will face with the questions below:

1- Which characteristics make customers churn or retain?

2- What are the most critical ones? What should we focus on?

For the first question, you should look at the 4th column (P>|z|). If the absolute p-value is smaller than 0.05, it means, that feature affects Churn in a statistically significant way. Examples are:

SeniorCitizen
InternetService_DSL
OnlineSecurity_NO

Then the second question. We want to reduce the Churn Rate, where we should start? The scientific version of this question is;

Which feature will bring the best ROI if I increase/decrease it by one unit?

That question can be answered by looking at the coef column. Exponential coef gives us the expected change in Churn Rate if we change it by one unit. If we apply the code below, we will see the transformed version of all coefficients:

np.exp(res.params)

As an example, one unit change in Monthly Charge means ~3.4% improvement in the odds for churning if we keep everything else constant. From the table above, we can quickly identify which features are more important.

Now, everything is ready for building our classification model.

Binary Classification Model with Data frame analytics in ElasticSearch

To fit Data frame analytics to our data, we should prepare features (X) and label(y) sets and do the train & test split.

To do so, let's create new job :

PUT _ml/data_frame/analytics/churn_analysis
{
  "description": "",
  "source": {
    "index": "es-churn"
  },
  "dest": {
    "index": "ml-churn-output"
  },
  "analyzed_fields": {
    "excludes": [
      "customerID"
    ]
  },
  "analysis": {
    "classification": {
      "dependent_variable": "Churn",
      "num_top_feature_importance_values": 10000,
      "training_percent": 95
    }
  },
  "model_memory_limit": "89mb"
}

To start the analysis we use the following command :

POST _ml/data_frame/analytics/churn_analysis/_start

ML module use automatic feature encoding for all categorical features which allows to process our features as we did in Feature Engineering section. However, it is allowed to define costume Feature Labelling. To do so we set the parameter as in following query :

PUT _ml/data_frame/analytics/churn_analysis_sofisticated
{
  "description": "",
  "source": {
    "index": "es-churn"
  },
  "dest": {
    "index": "ml-churn-output-s"
  },
  "analyzed_fields": {
    "excludes": [
      "customerID",
      "churn_flag"
    ]
  },
  "analysis": {
    "classification": {
      "dependent_variable": "Churn",
      "num_top_feature_importance_values": 10000,
      "training_percent": 95,
      "feature_processors": [
        {
          "one_hot_encoding":  {
            "field":  "gender",
            "hot_map":  {
              "Male":  "gender_Male",
              "Female":  "gender_Female"
             }
          }
        },
        {
          "one_hot_encoding":  {
            "field":  "DeviceProtection",
            "hot_map":  {
              "Yes":  "DeviceProtection_Yes",
              "No":  "DeviceProtection_No"
             }
          }
        },
      ]
    }
  },
  "model_memory_limit": "89mb"
}

query for accuracy :

POST es-churn-output/_search?size=0
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "training_accuracy": {
      "scripted_metric": {
        "init_script": "state.samples = []", 
        "map_script": "if(doc['ml.is_training'].value == true){state.samples.add(doc['ml.Churn_prediction'].value == doc.Churn.value ? 1 : 0)}",
        "combine_script": "double correct = 0; double total = 0; for (t in state.samples) { total++; correct += t } return correct/total",
        "reduce_script": "double accuracy = 0; for (a in states) { accuracy += a } return accuracy"
      }
    },
    "test_accuracy": {
      "scripted_metric": {
        "init_script": "state.samples = []", 
        "map_script": "if(doc['ml.is_training'].value == false){state.samples.add(doc['ml.Churn_prediction'].value == doc.Churn.value ? 1 : 0)}",
        "combine_script": "double correct = 0; double total = 0; for (t in state.samples) { total++; correct += t } return correct/total",
        "reduce_script": "double accuracy = 0; for (a in states) { accuracy += a } return accuracy"
      }
    }
  }
}

By using this simple model, we have achieved 80% accuracy:

Our actual Churn Rate in the dataset was 26.5% (reflects as 73.5% for model performance). This shows our model is a useful one. Better to check our classification model to see where exactly our model fails.

POST es-churn-output/_search?size=0
{
  "query": {
    "match": {
      "ml.is_training": true
    }
  },
  "aggs": {
        "precision_0": {
      "scripted_metric": {
        "init_script": "state.samples = []", 
        "map_script": "if(doc['ml.Churn_prediction'].value == doc.Churn.value){state.samples.add( doc.Churn.value == 0 ? 1 : 0)}",
        "combine_script": "double correct = 0; double total = 0; for (t in state.samples) { total++; correct += t } return correct/total",
        "reduce_script": "double accuracy = 0; for (a in states) { accuracy += a } return accuracy"
      }
    },
    "precision_1": {
      "scripted_metric": {
        "init_script": "state.samples = []", 
        "map_script": "if(doc['ml.Churn_prediction'].value == doc.Churn.value){state.samples.add( doc.Churn.value == 1 ? 1 : 0)}",
        "combine_script": "double correct = 0; double total = 0; for (t in state.samples) { total++; correct += t } return correct/total",
        "reduce_script": "double accuracy = 0; for (a in states) { accuracy += a } return accuracy"
      }
    },
    "recall_0": {
      "scripted_metric": {
        "init_script": "state.samples = []", 
        "map_script": "if(doc.Churn.value == 0){state.samples.add(  doc['ml.Churn_prediction'].value == doc.Churn.value ? 1 : 0)}",
        "combine_script": "double correct = 0; double total = 0; for (t in state.samples) { total++; correct += t } return correct/total",
        "reduce_script": "double accuracy = 0; for (a in states) { accuracy += a } return accuracy"
      }
    },
    "recall_1": {
      "scripted_metric": {
        "init_script": "state.samples = []", 
        "map_script": "if(doc.Churn.value == 1){state.samples.add(  doc['ml.Churn_prediction'].value == doc.Churn.value ? 1 : 0)}",
        "combine_script": "double correct = 0; double total = 0; for (t in state.samples) { total++; correct += t } return correct/total",
        "reduce_script": "double accuracy = 0; for (a in states) { accuracy += a } return accuracy"
      }
    }
  }
}

We can interpret the report above as if our model tells us, 100 customers will churn, 26 of it will churn (0.26 precision). And actually, there are around 128 customers who will churn (0.78 recall). Especially recall is the main problem here, and we can improve our model’s overall performance by:

Adding more data (we have around 2k rows for this example)
Adding more features
More feature engineering
Trying other models
Hyper-parameter tuning

Moving forward, let’s see how our model works in detail. First off, we want to know which features our model exactly used from the dataset. Also, which were the most important ones?

For addressing this question, we can use the feature importance section chart:

Features importance for Classification Model

We can see that our model assigned more importance to Contract and tenure compared to others.

Finally, the best way to use this model is assigning Churn Probability for each customer, create segments, and build strategies on top of that. To get the churn probability from our model, consult the report as below:

Churn Probabilities of the customers

or use the following query :

POST es-churn-output/_search
{
  "query": {
    "match_all": {}
  },
  "fields": ["customerID", "ml.prediction_probability"],
  "_source": false
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README-4.md

README-4.md

Part 4: Churn Prediction

Exploratory Data Analysis

Feature Engineering

Logistic Regression

Binary Classification Model with Data frame analytics in ElasticSearch

Files

README-4.md

Latest commit

History

README-4.md

File metadata and controls

Part 4: Churn Prediction

Exploratory Data Analysis

Feature Engineering

Logistic Regression

Binary Classification Model with Data frame analytics in ElasticSearch