The goal of this project was to build a predictive model to determine the income level for people in the US.
- It was a binary classification problem and the income levels were binned at below 50K and above 50K.
- Various operations on the data set were performed like data exploration, data cleaning, and feature engineering to make the data suitable for building the model.
- The data being highly imbalanced various techniques like undersampling, oversampling and SMOTE was applied to make the data more balanced.
- The model was trained using methods like Naive Bayes, SVM, and XgBoost
Statistical Software used: R
R packages used: caret, data.table, mlr, dplyr, ggplot2 etc.