Skip to content

Hoax classification model with optimal parameters (and cross validated)

Notifications You must be signed in to change notification settings

kmyafi/Hoax-Classification

Repository files navigation

Analysis & Classification of Hoax in MAFINDO Dataset

Credits :

  1. Yudistira Dwi Cahya
  2. Wulan Akhsah
  3. Kamal Muftie Yafi
  4. Rachel Thyffani Margaretha S
  5. Vesya Padmadewi

Set

About The Data

This dataset contained two label values, namely "1" for hoax and "0" for not hoax. The total data in this dataset is 4,701. Each label has a varied amount of data distribution, including 3850 data for hoax and 851 data for not hoax.

Label Hoax Not Hoax
Total Data 3850 851

Algorithm included

  • Text cleaning/preprocessing
  • Non-standard word replacement
  • Feature extraction: BoW, TF-IDF
  • Classification: Naive Bayes, SVM, Logistic Regression, Decision Tree, kNN, ANN
  • Cross-Validation: Grid Search, Random Search
  • Post analysis: topicwizard, Voyant Tools, WordCloud