Skip to content

Latest commit

 

History

History
31 lines (23 loc) · 1.72 KB

README.md

File metadata and controls

31 lines (23 loc) · 1.72 KB

Multiclass classification of forest cover type

In this competition, the task is to predict the forest cover type (the predominant kind of tree cover) from cartographic variables. Each observation in our dataset corresponds to a $30\textrm{m} \times 30\textrm{m}$ patch of land located in the Roosevelt National Forest of northern Colorado. There are seven possible forest cover types:

1. Spruce/Fir
2. Lodgepole Pine
3. Ponderosa Pine
4. Cottonwood/Willow
5. Aspen
6. Douglas-fir
7. Krummholz

The training dataset for this Kaggle competition consists of $15120$ samples with $56$ features (a mixture of both continuous and categorical data), including the Cover_Type represented by an integer ${1,2,3,4,5,6,7}$. Our task is to classify test samples by cover type (i.e. this is a multi-class classification task).

We use a variety of classification techniques, including:

1. Logistic Regression
2. Support Vector Classifier
3. K-Nearest Neighbors
4. Decision Tree
5. Random Forest
6. XGBoost
7. AdaBoost
8. LightGBM
9. Extra Trees Classifier

After tuning hyperparameters with GridsearchCV, our best model achieves a test score within the top 6.1% of the Kaggle leaderboard. For additional details and background information related to this dataset, see the Kaggle competition page at kaggle.com/c/forest-cover-type-prediction.