Skip to content

Akankhya-Mohapatra/Statistical-Learning-with-R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Goal

      The research objective of our simulation project will be to perform a model comparison between a random forest model and a logistic regression model in the context of a binary classification problem. The simulation will be modelled after one conducted by Kirasich et al. (2018), in which a similar study was carried out. 
      We will differentiate our study through the addition of unique scenarios not looked at in their study - impact of missing values and study of two different missing values imputation - Random Forest Imputation and mode imputation. 
       To measure the performance of our models and to substantiate our research objectives, we will use the misclassification rate (accuracy) along with the AUC/ROC and Cumulative Lift curves to visualize the results of the simulation. We will additionally use the AUC/ROC curve to compare the sensitivity (true positive rate) of the competing model to see if either model performs better for this specific performance metric.

References

Kirasich, Kaitlin; Smith, Trace; and Sadler, Bivin (2018) "Random Forest vs Logistic Regression: Binary Classification for Heterogeneous Datasets," SMU Data Science Review: Vol. 1 : No. 3 , Article 9. Available at: https://scholar.smu.edu/datasciencereview/vol1/iss3/9

Releases

No releases published

Packages

No packages published