Big Data Analysis with NLP

These ipython notebooks cover some introductory experiments on Apache Spark big data platform using map-reduce style programming in python. These notebooks detail every stage of experiments for:

a) Spam detection . Highlights word2vec techniques using simple hashing, logistic regression based model deployment and testing steps

b) Document Classification . Above preprocessing applied to 20 newsgroup dataset with Naive Bayes and Spark pipeline implementation.