Skip to content

The models are used to classify the toxic comments as toxic, severely toxic, insult, threat, obscene, & identity hate. By data collection & preprocessing to classify toxic comments with the help of lemmatization, lexicon normalization, & TF-IDF algorithm, we train & test the models using ML algorithms & evaluate using ROC curves & hamming score.

Notifications You must be signed in to change notification settings

Manaswi-Vichare/Toxic-Comment-Analysis-for-Online-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Toxic-Comment-Analysis-for-Online-Learning

Third-year B.Tech project in Computer Science and Engineering (Intelligent Systems) at MIT School of Engineering, MIT ADT University.

Publication at IEEE ACCESS'21 : https://ieeexplore.ieee.org/document/9563344

Abstract:

Due to recent circumstances of the pandemic, online platforms are becoming more and more essential for communication in many sectors. But because of this, a lot of negativity and toxic comments are surfacing, resulting in degradation and online abuse. Educational systems and Institutions heavily rely on such platforms for e-learning leading to unrestricted attacks of toxic and negative comments towards teachers and students. Due to this work, issues of constant bullying and online abuse will be reduced. The comments classified are according to the parameters from our self-prepared dataset combined with Kaggle's toxic comment dataset, named as toxic, severely toxic, obscene, threat, insult, and identity hate. Machine Learning algorithms such as Logistic Regression, Random Forest, and Multinomial Naive Bayes are used. For data evaluation, ROC and Hamming scores are used. The output will be shown as the rate of each category in percentile and in a graphical format. This work will help reduce the online bullying and harassment faced by teachers and students and help create a non-toxic learning environment. In this way, the main focus will be on studying and not getting de-motivated and discouraged by hateful comments and people commenting toxic comments will also get reduced.

Algorithms Used:

  1. Logistic Regression
  2. Random Forest
  3. Multinomial Naive Bayes
  4. NLP Techniques (lemmatization, lexicon normalization, and TF-IDF algorithm)

Flowchart:

image

Results:

image

ROC Curves:

image image image

Conclusion:

In this paper, three models for toxic comment classification are proposed, which are: Logistic regression model, Multinomial Naive Bayes, and Random Forest. The models are used to classify the toxic comments as toxic, severely toxic, insult, threat, obscene, and identity hate. By data collection and preprocessing to classify toxic comments with the help of lemmatization, lexicon normalization, and TF-IDF algorithm, we train and test the models and evaluate using ROC curves and hamming score. Based on the obtained results, we can conclude that the most effective is the Logistic regression model, which provides the best accuracy: 0.92 when testing on a training data set. As a future scope, a better and faster model in place of Random Forest can be implemented. We can implement deep learning models to obtain a much higher accuracy rate.

About

The models are used to classify the toxic comments as toxic, severely toxic, insult, threat, obscene, & identity hate. By data collection & preprocessing to classify toxic comments with the help of lemmatization, lexicon normalization, & TF-IDF algorithm, we train & test the models using ML algorithms & evaluate using ROC curves & hamming score.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published