Skip to content

A project from my Codecademy Data Science course about OKcupid data

Notifications You must be signed in to change notification settings

muellermax/OKCupid-machinelearning

Repository files navigation

Machine Learning with OKCupid Dataset

A capstone project of my Data Science course at Codecademy.

Installation and usage

The jupyter notebook is based on Python 3.7 and relies mainly on pandas, numpy, matplotlib and seaborn for data preparation and visualization. Furthermore I imported from SKlearn different machine learning models: Neural Network Classifier, K-Neighbors Classifier, Random Forst Classifier and Support Vector Machines.

Project motivation

The task set by Codecademy was to apply machine learning techniques to predict a variable from the OKCupid dataset. Codecademy's example was about predicting the Zodiac sign based on the user's responses to different questions.

My approach was to predict the body type (e.g. average, fit, curvy etc.) based on the user's diet (e.g. vegetarian, everything etc.) and his drinking, smoking and taking drugs habits.

As this data was provided by Codecademy during a paid course, I am not able to share it.

Result and more ideas

The maximum score was achieved by the Neural Network Classifier, K-Neighbors, SVC (linear kernel) at lay at 0.561. I have used all features for this prediction. Maybe this result could be improved by selected the features (e.g. not taking into account the smoking habits).

File description

  • OKCupid data - correlation between users body type and users diet.ipynb: Jupyter notebook with all my code.
  • instructions_codecademy.md: The instructions from Codecademy which I found quite useful when first approaching the dataset.

How to interact

Every idea and contribution is welcome.

Acknowledgments

Thanks to Codecademy and OKCupid for providing the data. Also thanks to the developers of all those useful libraries like pandas, numpy, matplotlib, seaborn and sklearn.

Author

Maximilian Müller, Business Development Manager in the Renewable Energy sector. Now diving into the field of data analysis.

About

A project from my Codecademy Data Science course about OKcupid data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published