Skip to content
This repository has been archived by the owner on Oct 25, 2020. It is now read-only.

sayhitosandy/Rumour_Stance_Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Rumour Stance Classification using Improved Feature Extraction

The task of determining veracity and authenticity of social media content has been of recent interest to the field of NLP. False claims and rumours affect peoples perceptions of events and their behaviour, sometimes in harmful ways. Here, the task is to classify tweets in threads of tweets based on the stance possessed by the tweet, which can be of 4 categories: supporting (S), denying (D), querying (Q), or commenting (C), i.e., SDQC.

By improving feature extraction and incorporating tweet dependent (and also other textual) features such as hashtags, link content, etc., we were able to achieve accuracies close to that of the state of the art on most of our models. Our highest reported accuracy is 77.6%, which is comparable to that of the state of the art model (78.4%). We also tried to improve the recall on Deny and Query classes by augmenting the training dataset. We also tried to do the same using ensemble methods, and bagging/boosting techniques.

Links

  1. SemEval-2017 Task 8 Dataset
  2. Report
Dataset Structure:

Steps to Run the Code:

Python version: 2.7

  1. First run the Data.py file. This python script will will read the dataset, preprocess it and find the features of the tweets. These features are used by different classifiers. The features are stored as pickle files.
  2. To get the results of SVM and Logestic Regression, run the SVM_LRClassifier.py file.
  3. nn.py is a python script for the neural network model. Run the nn.py file to get the results for neural network model. Keras Library is used in this script.
  4. Run the NBClassifier.py file for Naive bayes results.
  5. The jupyter notebook ngram_models.ipynb (Python version: 3.6) includes code and output for:
  • loading the rumour eval training and test data
  • preprocessing the data
  • vectorizing the data using CountVectorizer and TFIDF, for Unigram, Bigram and Trigram
  • running different classifiers on it, including
    • MultinomialNB
    • SVM
    • Logistic Regression
    • RandomForest
    • XGBoost
  • an attempt at LSTM
  • loading and collecting the newly collected data

References

  1. Derczynski et. Al, SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours.
  2. Kochkina et. Al, Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM.
  3. Dataset Link, SemEval-2017 Task 8 Dataset.
  4. Bahuleyan et. Al, UWaterloo at SemEval-2017 Task 8: Detecting Stance towards Rumours with Topic Independent Features.