Skip to content

Loan Default Prediction, Individual Level Loan Data, Machine Learning, Logistic regression, Ridge, LASSO, Gradient Boosting, SVM, Random Forest

Notifications You must be signed in to change notification settings

xuanyin/loan_default_prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Loan Default Prediction

I conducted loan default prediction by applying six machine learning algorithms (Logistic regression, Ridge, LASSO, Gradient Boosting, SVM, Random Forest) on individual level loan data from Lending Club. The prediction model I developed will help Lending Club to detect whether the new borrowers will be default (do not pay back the loan in time) so that Lending Club can avoid the risk to losing money.

I cleaned the data, constructed outcome metric and features, and selected features.

The data had a serious problem of unbalanced classes: a relatively small proportion of "Default" cases vs. large proportion of "Non-Default" cases, which led to learning algorithms being less capable or even unable to predict "Default." I employed two methods to solve this problem. The first one is undersampling majority classes and/or oversampling minority classes before training. The second one is changing the prediction threshold based on ROC curve.

For each algorithm, I did ten fold cross validation to calculate confusion matrices. For each confusion matrix from each fold cross validation, I calculate performance statistics for both "Default" class and "Non-Default" class: Precision, Recall, and F-score, and the total accuracy rate. I compared the six machine learning algorithms based on the average of performance statistics across ten fold cross validation. Random Forest outperformed other algorithms and had all the performance statistics being over 0.99. All the work was done using R.

About

Loan Default Prediction, Individual Level Loan Data, Machine Learning, Logistic regression, Ridge, LASSO, Gradient Boosting, SVM, Random Forest

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages