Logistic Regression and AdaBoost for Classification

In ensemble learning, we combine decisions from multiple weak learners to solve a classification problem. In this project, I have implemented a Logistic Regression (LR) classifier and used it within AdaBoost algorithm.

Programming Language/Platform

Python 3

Dataset

Demonstrated the performance and efficiency of my implementation for the following three different datasets.

How to Run :

$ python 1605042.py <path_to_dataset_1> <path_to_train_dataset_2> <path_to_test_dataset_2> <path_to_dataset_3>

Inside the main function code for running experiments on three datasets are located in three different sections. Hence to run the experiment on a specific dataset, rest of the sections should be commented out.

The train function runs logistic regression on the given dataset and returns the hypothesis parameters. Given the features and hypothesis parameters, the predict function returns the predictions and given the original label and predictions, the compute_metric function computes the necessary metrics. For the sake of comfortable visualization, all the outputs are written to a file called “out.txt” in the same folder the script is being run.

Performance

Dataset 1:

Logistic Regression :

Performance measure	Training	Test
Accuracy	0.7965921	0.80908445
Sensitivity	0.5266272	0.56034482
Specificity	0.8964259	0.89066918
Precision	0.6528117	0.62700964
False discovery rate	0.3471882	0.37299035
F1 score	0.5829694	0.59180576

Adaboost :

Number of boosting rounds	Training	Test
5	0.7955271	0.81263307
10	0.7946396	0.81050390
15	0.7951721	0.81192334
20	0.7951721	0.81192334

Dataset 2:

Logistic Regression :

Performance measure	Training	Test
Accuracy	0.8245754	0.82666912
Sensitivity	0.5466139	0.54524180
Specificity	0.9127427	0.91371129
Precision	0.6652180	0.66151419
False discovery rate	0.3347819	0.33848580
F1 score	0.6001120	0.59777651

Adaboost :

Number of boosting rounds	Training	Test
5	0.8449986	0.8443584
10	0.8453364	0.8452797
15	0.8453364	0.8452797
20	0.8453364	0.8452797

Dataset 3:

Logistic Regression :

Performance measure	Training	Test
Accuracy	0.99581527	0.996413628
Sensitivity	0.84061696	0.854368932
Specificity	0.99888234	0.999389747
Precision	0.93696275	0.967032967
False discovery rate	0.06303724	0.032967032
F1 score	0.88617886	0.907216494

Adaboost :

Number of boosting rounds	Training	Test
5	0.99586509	0.996612871
10	0.99591491	0.996612871
15	0.99591491	0.996612871
20	0.99591491	0.996612871

Observation :

As the third dataset is highly unbalanced and the positive samples were very rare, as suggested I took all the positive samples and around 25k negative samples (50 times the positive samples) and then shuffled and splitted into train test sets. For this skewness in the dataset, the trained model will more likely predict a sample to be negative. Since the test dataset also has this skewness and has a very high proportion of negative samples, the accuracy is very high. If the negative samples are taken around 2500 (5 times the positive samples), the accuracy slightly decreases. All the metrics are given below:

Train Set :

Accuracy : 0.9699279966116052
Sensitivity : 0.8221649484536082
Specificity : 0.9989863152559554
Precision : 0.9937694704049844
False discovery rate : 0.006230529595015576
F1 : 0.8998589562764457

Test Set :

Accuracy : 0.9763113367174281
Sensitivity : 0.8653846153846154
Specificity : 1.0
Precision : 1.0
False discovery rate : 0.0
F1 : 0.9278350515463918

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
1605042.pdf		1605042.pdf
1605042.py		1605042.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Logistic Regression and AdaBoost for Classification

Programming Language/Platform

Dataset

How to Run :

Performance

Dataset 1:

Logistic Regression :

Adaboost :

Dataset 2:

Logistic Regression :

Adaboost :

Dataset 3:

Logistic Regression :

Adaboost :

Observation :

Train Set :

Test Set :

About

Releases

Packages

Languages

Shukti042/Logistic-Regression-and-AdaBoost-for-Classification

Folders and files

Latest commit

History

Repository files navigation

Logistic Regression and AdaBoost for Classification

Programming Language/Platform

Dataset

How to Run :

Performance

Dataset 1:

Logistic Regression :

Adaboost :

Dataset 2:

Logistic Regression :

Adaboost :

Dataset 3:

Logistic Regression :

Adaboost :

Observation :

Train Set :

Test Set :

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages