- Python 2.7
- Numpy >= 1.14.2
- Matplotlib >= 2.2.0
- Pandas >= 0.22.0
- Scikit-Learn >= 0.19.1
The data was collected as a marketing campaign to predict if a customer would make a term deposit in the bank.
The dataset considered for the project is 10% of the UCI bank Marketing dataset available online. The dataset has 4119 rows with 19 features.
The issues in the dataset were as follows: -> The features had missing values which had to be imputed. -> Preprocessing involved handling categorical data. -> The dataset was imbalanaced. Number of class 1 (yes) labels were low compared to number of class 0 (no) labels.
Preprocessing work done on the data included:
- Outlier removal
- Label and one hot encoding
- Handling missing data by mode imputation
- Handling imbalanced data by oversampling using SMOTE,
- Dimensionality reduction
- Normalization and standardization
Classsifiers used:
- Support Vector Machine (SVM)
- Naive Bayes
- K Nearest Neighbors
- Random Forest
- Perceptron
Performance Evaluation Metric used:
- F1 score
- AUC score
- Training and test accuracy
- Confusion matrix
- ROC plots