Skip to content

Analysis using RandomOverSampler, SMOTE algorithm, ClusterCentroids algorithm, SMOTEENN algorithm, and machine learning models BalancedRandomForestClassifier and EasyEnsembleClassifier.

Notifications You must be signed in to change notification settings

JennyJohnson78/Credit_Risk_Analysis

Repository files navigation

Credit Risk Analysis

Overview

Having worked at a financial insitution and writing consolidation loans for individuals who could not pay their loans, financial risk, including credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore different techniques are needed to train and evaluate models with unbalanced classes. In this analysis, credit card data will be oversampled using the RandomOverSampler and SMOTE algorithms, and undersampled using the ClusterCentroids algorithm. Then, a combinatorial approach of over- and undersampling using the SMOTEENN algorithm will be conducted. Next, the machine learning models BalancedRandomForestClassifier and EasyEnsembleClassifier will be used to predict credit risk. Finally, there will be an evaluation the performance of these models and a written recommendation on whether they should be used to predict credit risk.

Results

  • Naive Random Oversampling Results: The balanced accuracy test is 65.72%, the precision score for high risk is very low at 1%. The recall is 62%.

image

image

  • SMOTE Oversampling Results: The balanced accuracy test is 64.78%, the precision score for high risk is very low at 1%. The recall is 68%.

image

image

  • Undersampling Results: The balanced accuracy test is 54.43%, the precision score is 99%. The recall is 40%.

image

image

  • Combination (Undersampling and Oversampling) Results: The balanced accuracy test is 64.47%, the precision score is 99%. The recall is 57%.

image

image

  • Balanced Random Forest Classifier Results: The balanced accuracy test is 77.38%, the precision score is 99%. The recall is 87%.

image

image

  • Easy Ensemble AdaBoost Classifier Results: The balanced accuracy test is 93.17%, the precision score is 99%. The recall is 94%.

image

image

Summary

The first four models dealt with undersampling, oversampling, and a combination of both under and oversampling. These models were used to analyze credit card data and determine which model is the most effective at predicting the highest risk loans. The ensemble classifier is used to analyze and predict which loans are high risk or low risk. The first four models have accuracy scores that are not as high as the ensemble classifiers. Their recall percentages are low as well. Essemble classifiers have the best balance of precision and recall, which is preferable in a model. Therefore, I recomment the Easy Ensemble Classifier model.

About

Analysis using RandomOverSampler, SMOTE algorithm, ClusterCentroids algorithm, SMOTEENN algorithm, and machine learning models BalancedRandomForestClassifier and EasyEnsembleClassifier.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published