Skip to content

Kaggle competition: predicting forest cover type with multiclass classification algorithms. Logistic Regression, SVC, KNN, Decision Tree, Random Forest, XGBoost, AdaBoost, LightGBM, & Extra Trees.

Notifications You must be signed in to change notification settings

owenpb/Kaggle-Forest-Cover-Prediction

Repository files navigation

Multiclass classification of forest cover type

In this competition, the task is to predict the forest cover type (the predominant kind of tree cover) from cartographic variables. Each observation in our dataset corresponds to a $30\textrm{m} \times 30\textrm{m}$ patch of land located in the Roosevelt National Forest of northern Colorado. There are seven possible forest cover types:

1. Spruce/Fir
2. Lodgepole Pine
3. Ponderosa Pine
4. Cottonwood/Willow
5. Aspen
6. Douglas-fir
7. Krummholz

The training dataset for this Kaggle competition consists of $15120$ samples with $56$ features (a mixture of both continuous and categorical data), including the Cover_Type represented by an integer ${1,2,3,4,5,6,7}$. Our task is to classify test samples by cover type (i.e. this is a multi-class classification task).

We use a variety of classification techniques, including:

1. Logistic Regression
2. Support Vector Classifier
3. K-Nearest Neighbors
4. Decision Tree
5. Random Forest
6. XGBoost
7. AdaBoost
8. LightGBM
9. Extra Trees Classifier

After tuning hyperparameters with GridsearchCV, our best model achieves a test score within the top 6.1% of the Kaggle leaderboard. For additional details and background information related to this dataset, see the Kaggle competition page at kaggle.com/c/forest-cover-type-prediction.

About

Kaggle competition: predicting forest cover type with multiclass classification algorithms. Logistic Regression, SVC, KNN, Decision Tree, Random Forest, XGBoost, AdaBoost, LightGBM, & Extra Trees.

Topics

Resources

Stars

Watchers

Forks