Skip to content

Machine learning model that classify a tweet as positive, negative or neutral.

Notifications You must be signed in to change notification settings

connect-midhunr/sentiment-analysis-covid19-related-tweets

Repository files navigation

Banner

In this project, I have attempted to analyze the Covid-19 related tweets dataset and build a machine learning model to classify a tweet as positive, negative or neutral.

💾 Project Files Description

This project contains an executable iPython Notebook, a presentation and source as follows:

Executable Files:

  • Sentiment_Analysis_of_Covid_19_related_Tweets.ipynb - Google Colab notebook containing data summary, exploration, visualisations, text processing, modelling and performance evaluation.

Source Directory:

  • Coronavirus Tweets.csv - Includes Covid-19 related tweets data.

📖 Problem Statement

Since the outbreak of coronavirus, it has affected more than 180 countries where massive losses in the economy and jobs globally and confining about 58% of the global population are caused. The research on people’s feelings is essential for keeping mental health and informed about Covid-19. The given challenge is to build a classification model to predict the sentiment of Covid-19 tweets.

📖 Approach

  1. Understanding the business task.
  2. Reading data from files given.
  3. Data pre-processing.
  4. Data visualization.
  5. Text processing.
  6. Modelling data.
  7. Conclusion.

📖 Text Processing

  • Lemmatization is used for text normalization since meaning of words is more crucial than the getting base words to determine which class the text data belongs to.
  • TF-IDF was used for feature extraction from text since just the importance of words also needs to be considered.
  • 📖 Modelling

    Four different algorithms were tried out to find out which one performs the best.

    1. Logistic Regression
    2. Random Forest
    3. Naive Bayes
    4. Support Vector Machine

    📘: Conclusion

    Result

    The model built using logistic regression algorithm has the highest accuracy, followed by the one using SVM. Therefore logistic regression model can be used for sentiment analysis.

    📜 Credits

    Midhun R | Avid Learner | Data Analyst | Data Scientist | Machine Learning Enthusiast

    Contact me for Data Science Project Collaborations

    LinkedIn Badge GitHub Badge Medium Badge Resume Badge

    📚 References

    About

    Machine learning model that classify a tweet as positive, negative or neutral.

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published