Skip to content

Credit card fraud is a burden for organizations across the globe. Specifically, $24.26 billion were lost due to credit card fraud worldwide in 2018, according to shiftprocessing.com. In this project, our goal was to build an effective and efficient model to predict fraud. We analyzed a real-world dataset that contained a list of government relat…

Notifications You must be signed in to change notification settings

mrinal1704/Credit-Card-Transaction-Fraud-Detection-using-Supervised-Machine-learning-with-an-Imbalanced-dataset

Repository files navigation

Project Name - Credit Card Transaction Fraud Detection using Supervised Machine learning

Introduction

This Project was from my one of the classes from USC Marshall School of Business, DSO 562 - Fraud Analytics. Credit card fraud is a burden for organizations across the globe. Specifically, $24.26 billion were lost due to credit card fraud worldwide in 2018, according to shiftprocessing.com. In this project, our goal was to build an effective and efficient model to predict transaction fraud.

Repositiory File description

Project report - Contains the full description of the project Project code - Python Jupyter notebook containing the full code with proper comments with instructions on how to run it in Google colab. Data Quality report - Contains the data visualisations and initial analysis of data for fully understanding the given data. Data Quality report code - Contains the code for the data quality report Dataset - Card transactions data

Dataset Description

Dataset Name: Card transactions data Description: This dataset contains the information of the card transactions that have occurred in USA. It contains fields like Card number, merchant number, merchant description, and amount of the transaction. It also contains a fraud label field which tells whether the transaction is good or bad. Time Period: 1 January 2010 – 31 December 2010 No. of Fields: 10 No. of Records: 96,753 Size of Dataset file: 7 MB

Summary

We analyzed a real-world dataset that contained a list of government related credit card transactions over the 2010 calendar year. The data presented a supervised problem as it included a column showing the transaction’s fraud label (whether a transaction was fraudulent or not). It also contained identifying information about each transaction such as the credit card number, merchant, merchant state, etc. The dataset had 96,753 records and 10 data fields. We first described and visualized each of the 10 data fields, cleaned the dataset, and filled in missing values. Then we created many variables and performed feature selection. Finally, we created a variety of machine learning models (both linear and nonlinear) and highlighted our results.

How to run the code

Just download the notebook into your system with the dataset and run it.

About

Credit card fraud is a burden for organizations across the globe. Specifically, $24.26 billion were lost due to credit card fraud worldwide in 2018, according to shiftprocessing.com. In this project, our goal was to build an effective and efficient model to predict fraud. We analyzed a real-world dataset that contained a list of government relat…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages