Boston Housing Prices - Machine Learning
The purpose of this project is to apply basic machine learning concepts on data collected for housing prices in the Boston, Massachusetts area to predict the selling price of a new home. This project is part of the Udacity Machine Learning Engineer Nanodegree.
Housing data for 1970s Boston was used here. For further description, please refer to the UCI Machine Learning Repository. The dataset is part of the sklearn.datasets module.
I first explored the data to obtain important features and descriptive statistics about the dataset. Next, I split the data into testing and training subsets, and determined a suitable performance metric for this problem. I then nalyzed performance graphs for a learning algorithm with varying parameters and training set sizes, allowing me to pick the optimal model that best generalized for unseen data. Finally, I tested this optimal model on a new sample and compare the predicted selling price to the population statistics.
This project was designed to get students acquainted with working with datasets in Python and applying basic machine learning techniques using NumPy and Scikit-Learn.
Technical skills learnt during this project:
- How to use NumPy to investigate the latent features of a dataset.
- How to analyze various learning performance plots for variance and bias.
- How to determine the best-guess model for predictions from unseen data.
- How to evaluate a model’s performance on unseen data using previous data.
-
boston_housing_Fox.ipynb: Jupyter notebook file containing the code and analyses for this project. In order to run, type:
jupyter notebook boston_housing_Fox.ipynb
-
boston_housing_Fox.html: Static HTML version of the Jupyter notebook analysis. In order to view, open the file using any browser (via 'File' - 'Open File' and selecting the boston_housing_Fox.html file)