Skip to content

Machine Learning Classification on Unbalanced Real World Dataset

Notifications You must be signed in to change notification settings

gill-0/Identifying-Fraud-at-Enron

Repository files navigation

Identifying-Fraud-at-Enron

Introduction

The goal of the Enron Case study is to analyze a dataset composed of financial and email features from Enron employees that were employed during the Enron scandal as well as other persons that did business with Enron. I will test various supervised machine learning algorithms in order to generalize patterns and be able to predict employees who may be fraudulent, indicated by the label POI – person of interest.

Final Analysis

Below is a blocks link that explains my analysis and results.

http://bl.ocks.org/gill-0/raw/a44ff333180fb13d460ee57c0345f0e4/

Files

Presentation of process and findings

Enron_fraud.html

Main script to create classifier

poi_id_final.py

Discover and graph outliers

final_outliers.py

Initial exploration and cleaning of data

explore_final.py

Creates two email features for testing in classifier

email_fraction.py

Udacity file provided to format and split data

feature_format.py

Udacity file provided to test performance of ML algorithm

tester.py