Skip to content

Latest commit

 

History

History
41 lines (19 loc) · 1.9 KB

README.md

File metadata and controls

41 lines (19 loc) · 1.9 KB

Machine Learning with OKCupid Dataset

A capstone project of my Data Science course at Codecademy.

Installation and usage

The jupyter notebook is based on Python 3.7 and relies mainly on pandas, numpy, matplotlib and seaborn for data preparation and visualization. Furthermore I imported from SKlearn different machine learning models: Neural Network Classifier, K-Neighbors Classifier, Random Forst Classifier and Support Vector Machines.

Project motivation

The task set by Codecademy was to apply machine learning techniques to predict a variable from the OKCupid dataset. Codecademy's example was about predicting the Zodiac sign based on the user's responses to different questions.

My approach was to predict the body type (e.g. average, fit, curvy etc.) based on the user's diet (e.g. vegetarian, everything etc.) and his drinking, smoking and taking drugs habits.

As this data was provided by Codecademy during a paid course, I am not able to share it.

Result and more ideas

The maximum score was achieved by the Neural Network Classifier, K-Neighbors, SVC (linear kernel) at lay at 0.561. I have used all features for this prediction. Maybe this result could be improved by selected the features (e.g. not taking into account the smoking habits).

File description

  • OKCupid data - correlation between users body type and users diet.ipynb: Jupyter notebook with all my code.
  • instructions_codecademy.md: The instructions from Codecademy which I found quite useful when first approaching the dataset.

How to interact

Every idea and contribution is welcome.

Acknowledgments

Thanks to Codecademy and OKCupid for providing the data. Also thanks to the developers of all those useful libraries like pandas, numpy, matplotlib, seaborn and sklearn.

Author

Maximilian Müller, Business Development Manager in the Renewable Energy sector. Now diving into the field of data analysis.