Skip to content

emanuelegiona/FDS-Project-HousePrices

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FDS-Project-HousePrices

Project for Fundamentals of Data Science 2018/2019, from the MSc in Computer Science.

Forked from luigiberducci, group composed by: luigiberducci, angelodimambro, and I.

Kaggle Score: 0.11440

Feature engineering overview

  • 3 new features introduced: total number of bathrooms, number of garage cars multiplied by garage area, total square feet
  • removal of multicollinear features
  • automatic removal of features receiving caret importance score equal to 0 when considering a Lasso regression model, until the RMSE value of such model didn't decrease any further

Models overview

Simple models

  • Lasso regression model
  • Ridge regression model
  • eXtreme Gradient Boosting model
  • Support Vector Machines

More complex models

  • Ensemble model (average)
  • Stacked regression model (both variants A and B)

Ensemble model

Our ensemble model performs a weighted average of predictions produced of a set of simple models, using the following weights and models:

Model Weight
Lasso 0.5
Ridge 0.5
XGB 3.5
SVM 5

Such weights have been optimized via 10-fold CV, minimizing the average RMSE and weights themselves.

Stacked regression model

A set of simple models' predictions is used to train a meta-model.

Variants:

  • Variant A: meta-model trained on the average of the predictions produced during the simple models' k-fold trainings
  • Variant B: meta-model trained on predictions produced by new instances of the simple models, those being trained on the whole training set

Our stacked regression model uses the following recipe:

Simple models Meta-model
Lasso Specific XGB
Ridge
XGB
SVM

Final predictions

Our final predictions are computed in the following way:

predictions = ( 2 * ensemble + xgb + svm + stacked_variantA + stacked_variantB ) / 6