Skip to content
This repository has been archived by the owner on Sep 13, 2022. It is now read-only.

BlancRay/PUAdapter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TWO PU-learning tools on spark

Release CI

PUAdapter

A set of machine learning tools and algorithms for learning from positive and unlabeled datasets.

This algorithm is based on Positive Samples are Completely Random Selected, and "a classifier trained on positive and unlabeled examples predicts probabilities that differ by only a constant factor from the true conditional probabilities of being positive."

Original paper:

Elkan, Charles, and Keith Noto. "Learning classifiers from only positive and unlabeled data." Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2008.

Code from pu-learning, which is based Python and I transformed code to scala in order to apply on Spark.

PU4Spark

A library for Positive-Unlabeled Learning for Apache Spark MLlib (ml package)

Reference to pu4spark

Implemented algorithms

Traditional PU

Original Positive-Unlabeled learning algorithm; firstly proposed in

Liu, B., Dai, Y., Li, X. L., Lee, W. S., & Philip, Y. (2002). Partially supervised classification of text documents. In ICML 2002, Proceedings of the nineteenth international conference on machine learning. (pp. 387–394).

Gradual Reduction PU (aka PU-LEA)

Modified Positive-Unlabeled learning algorithm; main idea is to gradually refine set of positive examples. Pseudo code was taken from:

Fusilier, D. H., Montes-y-Gómez, M., Rosso, P., & Cabrera, R. G. (2015). Detecting positive and negative deceptive opinions using PU-learning. Information Processing & Management, 51(4), 433-443.

Releases

No releases published

Packages

No packages published

Languages