Imbalanced-Data-Machine-Learning-project-on-US-census-data

The goal of this project was to build a predictive model to determine the income level for people in the US.

It was a binary classification problem and the income levels were binned at below 50K and above 50K.
Various operations on the data set were performed like data exploration, data cleaning, and feature engineering to make the data suitable for building the model.
The data being highly imbalanced various techniques like undersampling, oversampling and SMOTE was applied to make the data more balanced.
The model was trained using methods like Naive Bayes, SVM, and XgBoost

Statistical Software used: R

R packages used: caret, data.table, mlr, dplyr, ggplot2 etc.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
censusdata_project.R		censusdata_project.R

Provide feedback