Skip to content

Latest commit

 

History

History
34 lines (22 loc) · 2.33 KB

lee2013.md

File metadata and controls

34 lines (22 loc) · 2.33 KB
date tags
2021-06-19
paper, deep-learning, semi-supervised

Pseudo-Label: The Simple and Efficient Semi-Supervised LearningMethod for Deep Neural Networks

Link to the paper

Dong-Hyun Lee

International Conference Of Machine Learning (ICML 2013)

Year: 2013

This paper presents a simple yet powerful semi-supervised technique to improve generalization by combining supervised and unsupervised training.

The authors propose a new training method consisting on using part of the data as labeled data, and part of the data as unlabeled data. For the unlabeled data, Pseudo-Labels are generated by picking up the class with higher probability (given by the model being trained). These labels are used as if they were true labels. Formula of the loss of this new training procedure below.

The authors highlight the importance of properly tuning alpha. Too low or high values of alpha lead to no training or meaningless improvement, respectively.

This method is motivated in the Entropy Regularization method. When minimizing the entropy of the unlabeled data, the overlap between the class probability distributions is reduced, leading to low-density separation between classes.

Because neighbors of a data sample have similar activations, it is also likely that high-density regions have the same label. This method encourages the network output to be less sensitive to variations in the directions of the manifold.

Example of class density

Results using MNIST and different sizes of labelled samples

Interesting references in the paper

  • Entropy regularization: technique consisting on minimizing the cross-entropy using unsupervised data to reduce the class overlap and hence reduce also the density in the decision boundaries
  • Manifold Tangent Classifier: encourages network output to be insensitive to variations in the directions of the low dimensional manifold. Assumes that neighbors of a data sample have similar activations with the sample.