Skip to content

AkashD19/ProsperLoanDataAnalysis

Repository files navigation

ProsperLoanDataAnalysis

Prosper is a San Francisco based company where people can invest in personal loans or request to borrow money. What is interesting here is that it has a peer-to-peer lending process i.e. the company itself does not loan out the money but rather connects the borrower to the lender. This is a innovative approach and benefits its customers when compared to loan processes in various traditional banking institutions.

The dataset we have here is immense. It encompasses all the various data points considered when a loan is processed. I will attempt to deconstruct this vast .csv file to convey understanding of the data in a much more lucid way than scrolling through the many instances of loans provided in this dataset.

This dataset is immense and provides the scope of insightful analysis. I tried to cover as many variables as possible but there remains a lot of opportunities to explore this dataset further.

Understanding the meaning and effect of certain variables was the chief challenge in this dataset. To establish the apt combination of variables which yield meaningful analysis forms the crux of the problem here. Also some variables like Rate, APR etc. made for some confusion in deciding which variable would yield the best analysis.

The most interesting parts that I have discovered are the ones where I worked with multiple variables and their effect on each other. There is a lot of scope here and I would like to explore this domain more. Cleaning of data and unexpected results are part of any real-life dataset and this was a major part of my learning.

A variety of visualizations have made reduced the dataset to various components which can be further analyzed. Apart from this I have succesfully found that most of my assumptions where close to the real world values.

As part of future developments I would like to try out more combinations to figure out more advanced prediction models which might help in predicting a potential loan defaulter. Another direction might be to figure out more combinations and to find a better correlation between them which might help in production too.