NTUOSS-DataOdyssey

By Han Simeng from NTU Open Source Society

Artwork by Brendan Hyde

Workshop Details
When	Friday, 9 Sep 2018. 6:30 PM - 8:30 PM
Where	LT1, NTU North Spine Plaza
Who	NTU Open Source Society
Questions	We will be hosting a Pigeon Hole Live for collecting questions regarding the workshop

Errors

For errors, typos or suggestions, please do not hesitate to post an issue! Pull requests are very welcome, thank you!

Disclaimer: This workshop is for educational purposes only. No prototype or outcome of any type is intended for commercial use.

Introduction

Machine Learning is an interdisciplinary subject where computer science and statistics intersect.
In the workshop today, we will focus on the practical aspect of machine learning, i.e.,coding. In most cases, we give our algorithm an input and it gives us an output.
However, for a machine learning algorithm, we first feed a lot of data to the algorithm to let the algorithm determine itself how it should react to the data. This is the process of determining the parameters of the machine learning model.

In supervised machine learning, we feed the input and label, into the model and it will learn how to predict the output when we feed new inputs. Think about supervised learning as learning with a teacher who tells you the right answers.

In unsupervised machine learning, we only feed the input and the model will learn to predict the output solely based on the input. Think about unsupervised learning as learning without a teacher. Not all real-world data have a label, thus the necessity of unsupervised learning.

The second workshop will introduce two machine learning algorithms in order to demonstrate how the field can be used in real-world scenarios.
This includes logistic regression, a supervised method to solve classification problems, as well as k-means clustering, an unsupervised method to group together clusters of data by certain criteria.

We will use scikit-learn, a python package built for implement machine learning algorithms.
Logistic Regression with scikit-learn
K-Means with scikit-learn

Google Colabtory

See NTUOSS-PandasBasics for a comprehensive introduction on how to use Google Colabtory for data science projects and let's walk through it.
Copy this notebook to your own drive

Go to this link to download the data to be used in this workshop and upload it to Google Colabtory.

Odyssey Begins

Supervised Odyssey: Supervised Classification
Unsupervised Odyssey: Unsupervised Classification
End of journey

Supervised Odyssey: Supervised Classification

Packing up: Environment Setup

Import the module for linear regression algorithm from sklearn and plotting packages

Data Exploration

Use numpy to load the file as a data object

Inspect more details

Plot all data

Data Classification: Logistic Regression

Logistic Regression is used when the dependent variable(target) is categorical, i.e., we want to find class which each of the variables belongs to. For example, to classify spam emails, we find whether an email belongs to the spam class or the normal class.

Algorithm Intuition (online demo)

Coding with sklearn

Sigmoid function adds non-linearity into the model
z is the input to the sigmoid function, which is the dot product of input X and the weight w

Logistic regression predictive function

To conduct logistic regression with scikit-learn, we first create a LogisticRegression object
Then we fit the model to the data
The intercept and coef are the model parameters(weights)

Result Visualization

After obtaining the parameters, lets visualize the result by plotting the decision boundary.
Students whose score points are above the decision boundary will be admitted while the students below the decision boundary will be rejected

Now let's use our trained logistic regression model to predict if a student will be accepted or rejected.

Unsupervided Odyssey: Unsupervised Classification

Import the image reading module from matplotlib and the K-Means module from sklearn

Data Exploration

Read the image A 2D image is comprised of two dimensional RGB values.
700 is the row number.
1000 is the column number.
3 is the R, G, B value respectively.

Image Compression: K-Means

Algorithm Intuition (Online Demo)

K-means is one of the most popular unsupervised clustering algorithms.
"K" in K-means refers to k number of clusters.
"Means" refers to finding the means, or centroids of the clusters.

Coding with sklearn

Reshape the image to be 2-dimension
To run the KMeans algorithm, we first create a scikit-learn KMeans object with the number of clusters assigned to 20, which is the number of colors we want for the compressed image. Fit the model to the data, then use the centroids to compress the image

Data Visualization

Reshape X_recovered to have the same dimension as the original image
Now we can plot the original and the compressed image side by side.

End of Odyssey!

End of Journey

Congratualations on completing the Machine Learing Odyssey!
In this workshop we have learned how to use machine learning algorithms to solve some simple real-world problems.
In the next, which is also the last workshop of the NTUOSS Data Science workshop series, we will teach you deep learning, which is a subfield of machine learning and is even more interesting!

An approchable book if you want to learn more A Course in Machine Learning

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
data		data
images		images
.gitignore		.gitignore
README.md		README.md
data_odyssey_finished.ipynb		data_odyssey_finished.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NTUOSS-DataOdyssey

Errors

Introduction

Google Colabtory

Odyssey Begins

Supervised Odyssey: Supervised Classification

Packing up: Environment Setup

Data Exploration

Data Classification: Logistic Regression

Algorithm Intuition (online demo)

Coding with sklearn

Result Visualization

Unsupervided Odyssey: Unsupervised Classification

Data Exploration

Image Compression: K-Means

Algorithm Intuition (Online Demo)

Coding with sklearn

Data Visualization

End of Odyssey!

End of Journey

About

Releases

Packages

Languages

ShirleyHan6/NTUOSS-DataOdyssey

Folders and files

Latest commit

History

Repository files navigation

NTUOSS-DataOdyssey

Errors

Introduction

Google Colabtory

Odyssey Begins

Supervised Odyssey: Supervised Classification

Packing up: Environment Setup

Data Exploration

Data Classification: Logistic Regression

Algorithm Intuition (online demo)

Coding with sklearn

Result Visualization

Unsupervided Odyssey: Unsupervised Classification

Data Exploration

Image Compression: K-Means

Algorithm Intuition (Online Demo)

Coding with sklearn

Data Visualization

End of Odyssey!

End of Journey

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages