Skip to content

The provided dataset contained application (identity) fraud cases. It was a supervised problem as the data included a column showing the application’s fraud label (whether an application was fraudulent or not). It also contained several identifying data fields about the applicant such as SSN, address, phone number, etc. The dataset had 1,000,000…

Notifications You must be signed in to change notification settings

mrinal1704/Credit-Card-Application-Fraud-Detection-using-Supervised-machine-learning-models

Repository files navigation

Project Name - Credit Card Application Fraud Detection

Introduction

This is the 2nd project from my one of the classes from USC Marshall School of Business, DSO 562 - Fraud Analytics. It is a Supervised machine learning problem with imbalanced classification where we were asked to classify the fraudulent credit card applications.

Repositiory File description

  1. Project report - Contains the full description of the project
  2. Project code - Python Jupyter notebook containing the full code with proper comments.
  3. Data Quality report - Contains the data visualisations and initial analysis of data for fully understanding the given data.
  4. Data Quality report code - Contains the python code for data quality report.

Dataset Description

Dataset Name: Applications Data Description: This dataset contains the information of the people who filled their applications for a product. It contains fields like Social security number, name, address, phone no., and date of birth. It also contains a fraud label field which tells whether the application is good or bad. Time Period: 1 January 2016 – 31 December 2016 No. of Fields: 10 No. of Records: 1,000,000 Size of Dataset file: 83 MB

Summary

The provided dataset contained application (identity) fraud cases. It was a supervised problem as the data included a column showing the application’s fraud label (whether an application was fraudulent or not). It also contained several identifying data fields about the applicant such as SSN, address, phone number, etc. The dataset had 1,000,000 records and 10 data fields. We first described and visualized each of the 10 data fields and treated all frivolous values. Then we created 634 candidate variables and performed feature selection to reduce them to 30. Finally, we used a few different machine learning algorithms (both linear and nonlinear) to predict fraudulent applications records.

How to run the code

  • Just download the notebook into your system and run it.

About

The provided dataset contained application (identity) fraud cases. It was a supervised problem as the data included a column showing the application’s fraud label (whether an application was fraudulent or not). It also contained several identifying data fields about the applicant such as SSN, address, phone number, etc. The dataset had 1,000,000…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published