Credit Risk Analysis

Overview of the Analysis

LendingClub wants to take a deep look at their client base and assess their customers' level of credit risk. After analyzing the data set, with over 60k entries, machine learning was employed to take a closer look at the customers' loan stats for the year 2019.

Results

Naive Random Oversampling

Oversampling achieved an accuracy percentage of 65%, which is not bad on its own.
However, the model's precision 1% (high risk) and 100% (low risk)
And 63% and 67% for sensitiviy. The last two bullets combined make it a bad model as the model completely copies the low-risk data points and cannot accurately predict any high-risk data points. Which causes the model to assess every case as low risk.

SMOTE Oversampling

SMOTE Oversampling achieved an accuracy percentage of 65%, which is not bad.
However, like in the previous model, the precision is 1% (high risk) and 100% (lowr isk)
And, 63% (high risk) and 67% (low risk) sensitivity. Again, combing the last two points--this is not a good model for predicting the credit risk of customers as the model will predict the customers' behaviors to be low risk.

Undersampling

Undersampling achieved an accuracy percentage of 51% which is great.
However, this model too follows the same trend the previous two models had--it will erroneously predict that that the customers are low risk with a low risk precision percentage of 100% and a 1% precision with high risk cases.
Adding fuel to fire, moreover, this model has very poor sensitivity with 59% for high_risk cases and 43% for low risk clients.

Combination-Sampling

Combination sampling achieved an accuracy percentage of 62% which is fine.
But, once again, this model's precision with high risk customers was 1% and low risk customer precision was 100%
This model's sensitivity was nominally better than the previous ones' at 69% for low risk and 55% for high risk.

Balanced Random Forest Classifier

The balance random forest classifier achieved an accuracy score of nearly 79%, which is far too high for a machine learning model.
The model's precision for high risk customers was 4% and 100% for low risk--which would make the machine still mark cases as low-risk
The model had a 67% sensitivity for high risk cases and 91% sensitivity for low risk customers.

Adaboost Classifier

The adaboost classifier obtained an accuracy percentage of 93%-- which is much too high for a machine learning model.
The model's precision is 7% for high risk customers--which is the best out of all 6 models, but still too low. And 100% for low risk customers.
This model also had the best sensitivity out of any model at 91% for high risk customers and 94% for low risk customers.

Summary

In sum, all of the models show too many characterstics of a bad machine learning model. I would not choose any of them. On the whole, the ensemble models seemed to fair the best. But, the accuracy is too high and the precision for the customer types is too varied. This might be due to the nature of the data, and maybe bagging the data would improve the metrics. The main point of contention with these models is that they would put LendingClub in a bad position where the models erroneously mark every case as low risk. This would cause them to hand out loans to unworthy debtors and end up losing money.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.DS_Store		.DS_Store
LoanStats_2019Q1.csv		LoanStats_2019Q1.csv
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Risk Analysis

Overview of the Analysis

Results

Naive Random Oversampling

SMOTE Oversampling

Undersampling

Combination-Sampling

Balanced Random Forest Classifier

Adaboost Classifier

Summary

About

Releases

Packages

Languages

lawrencegoodwyn/LendingClub-Risk-Analysis

Folders and files

Latest commit

History

Repository files navigation

Credit Risk Analysis

Overview of the Analysis

Results

Naive Random Oversampling

SMOTE Oversampling

Undersampling

Combination-Sampling

Balanced Random Forest Classifier

Adaboost Classifier

Summary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages