These ipython notebooks cover some introductory experiments on Apache Spark big data platform using map-reduce style programming in python. These notebooks detail every stage of experiments for:
a) Spam detection . Highlights word2vec techniques using simple hashing, logistic regression based model deployment and testing steps
b) Document Classification . Above preprocessing applied to 20 newsgroup dataset with Naive Bayes and Spark pipeline implementation.
:)