Fake News Classification Task

CS 579: Online Social Network Analysis

Project II - Fake News Classification

Group 45

Vivekanand Reddy Malipatel (A20524871) Mohammed Shoaib (A20512491)

Steps to Run the program :

Run readdata.py to fetch the data from test_original, train_original csv files, preprocess it and Store it to test.csv and train.csv files.
Next, Run Each of the Machine Learning model stored in different python files to generate the classification metrics with validation data and Predict the labels on the test data. The predicted test data will best stored in seperate csv files for each model.
Next, Run the submissiong.py file to generate the submission.csv file with the model output that generates the best classification.

Libraries Required :

Pandas, nltk, sklearn, numpy

Fake News Classification Task

Overview

In the era of digital news consumption, the proliferation of fake news on social media has become a major issue. The goal of this project is to classify pairs of news articles into categories based on their relationship to each other, specifically whether they agree, disagree, or are unrelated to each other. This is crucial for maintaining the authenticity balance of the news ecosystem and for preventing the spread of misinformation.

Task Definition

Given the title of a fake news article (A) and the title of a coming news article (B), classify B into one of three categories:

agreed: B discusses the same fake news as A.
disagreed: B refutes the fake news in A.
unrelated: B is unrelated to A.

File Descriptions

Within the provided dataset, there are three CSV files:

train.csv: Contains training data with labels.
test.csv: Contains test data without labels.
sample_submission.csv: Demonstrates the expected submission format.

Data Columns

Both training and testing data will include the following columns:

id: The ID of each news pair.
tid1: The ID of fake news title 1.
tid2: The ID of news title 2.
title1_en: The English title of fake news 1.
title2_en: The English title of news 2.
label: (Training data only) Indicates the relationship between the news pair (agreed/disagreed/unrelated).

Instructions

Data Preparation: Split the train.csv file to create a validation set for model evaluation.
Model Training: Use the training data to train your classifier.
Evaluation: Assess the performance of your model using the validation set.
Prediction: Apply your trained model to the test.csv file to predict the labels.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitattributes		.gitattributes
DTpredictedtest.csv		DTpredictedtest.csv
DecisionTreeClassifier.py		DecisionTreeClassifier.py
FinalReport.pdf		FinalReport.pdf
NBpredictedtest.csv		NBpredictedtest.csv
NaiveBayesClassification.py		NaiveBayesClassification.py
ProjectPresentation.pdf		ProjectPresentation.pdf
README.md		README.md
RFpredictedtest.csv		RFpredictedtest.csv
RandomForestClassifier.py		RandomForestClassifier.py
SVMClassifier.py		SVMClassifier.py
SVMpredictedtest.csv		SVMpredictedtest.csv
readdata.py		readdata.py
submission.csv		submission.csv
submission.py		submission.py
test.csv		test.csv
test.py		test.py
test_original.csv		test_original.csv
train.csv		train.csv
train_original.csv		train_original.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS 579: Online Social Network Analysis

Fake News Classification Task

Overview

Task Definition

File Descriptions

Data Columns

Instructions

About

Releases

Packages

Languages

VivekMalipatel/Fake-News-Classification-Project

Folders and files

Latest commit

History

Repository files navigation

CS 579: Online Social Network Analysis

Fake News Classification Task

Overview

Task Definition

File Descriptions

Data Columns

Instructions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages