Skip to content

Latest commit

 

History

History
10 lines (5 loc) · 502 Bytes

README.md

File metadata and controls

10 lines (5 loc) · 502 Bytes

Big Data Analysis with NLP

These ipython notebooks cover some introductory experiments on Apache Spark big data platform using map-reduce style programming in python. These notebooks detail every stage of experiments for:

a) Spam detection . Highlights word2vec techniques using simple hashing, logistic regression based model deployment and testing steps

b) Document Classification . Above preprocessing applied to 20 newsgroup dataset with Naive Bayes and Spark pipeline implementation.

:)