Skip to content

Thesis project concerning classification of true and fake political, gossip world news using Machine Learning and Deep Learning Techniques

License

Notifications You must be signed in to change notification settings

stavIatrop/Thesis-Fake-News-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Abstract

In the recent years a widespread proliferation of fake and fabricated content has been witnessed among various online web platforms, leading to the formation of a distorted and most of the times manipulated public opinion towards political, social or other everyday issues. Therefore, there have been many efforts to develop fake news detection systems to contribute in the unveiling and debunking of deceptive news.

In this thesis we will be working on three different fake news datasets, trying to spot the linguistic differences between fake and truthful articles providing a visualization of the results. The aim of the project is to compare five different machine learning classifiers as well as to develop an ensemble method of different combinations of classification models to investigate which gives best universal results for all three data sources. Afterwards, a simple long short-term memory neural network was developed to examine the results of a deep learning method in comparison to statistical machine learning models.

The methodology followed in this project starts with applying natural language processing and feature extraction techniques which aim to prepare the data to be “fed” into each classification model for training and tuning parameters for each classifier. The results are then presented and compared using barplots, confusion matrices and precision–recall curves.

Implementation tools

The project is implemented on Python 3.6.9 accompanied with various python libraries. For the web crawling part that took place, BeautifulSoup (Python library for pulling data out of HTML and XML files) was used. During the preprocessing of the datasets, re, string and gensim libraries are utilized. As far as algorithms are concerned, sklearn (machine learning library for python) is mainly used and PyTorch framework for the neural network model. As for the data visualization stage, wordcloud (python library for generating wordclouds), matplotlib (plotting library for Python), scikitplot (visualization library or “the result of an unartistic data scientist’s dreadful realization that visualization is one of the most crucial components in the data science process, not just a mere afterthought.” [6]) and seaborn (Python data visualization library based on matplotlib) were used to construct different plots, tables and diagrams.

The work for this thesis was conducted between October 2019 and June 2020 at the Department of Informatics and Telecommunications of the National and Kapodistrian University of Athens and was supervised by Assistant Professor Alexandros Ntoulas.

About

Thesis project concerning classification of true and fake political, gossip world news using Machine Learning and Deep Learning Techniques

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published