Skip to content

Trained over 60,000 IMDB rating to categorize positive and negative review

Notifications You must be signed in to change notification settings

safwanshamsir99/Sentiment-Analysis

Repository files navigation

python spyder GitHub NumPy Pandas scikit-learn tf

model_loss

Predictive classification model using Natural Language Processing (NLP) for IMDB movie rating.

Using deep learning model to train over 49,000 IMDB rating dataset to categorize either the review is positive and negative.

Description

  1. The project's objective is to categorize the IMDB movies rating.
  2. The IMDB movie reviews contain enormous amount of data, which can be used to predict whether the movie review is a negative or positive review.
  3. The dataset contains anomalies such as HTML tags (removed using RegEx), lowercase/uppercase, and duplicates data.
  4. The method used for the deep learning model are word embedding, LSTM and Bidirectional.
  5. Several method can be used to improve the model such as lemmatization, stemming, CNN, n-grams, etc.

Deep learning model images

model_architecture

Results

Training loss & Validation loss:

model_loss

Training accuracy & Validation accuracy:

model_accuracy

Model score:

model_score

Discussion

  1. The model achieved 84% accuracy during training.
  2. Both recall and f1 score report 85%.
  3. However, the model starts to overfit after 2nd epochs. Early stopping can be used to prevent overfitting. The dropout data can be increased to control overfitting.

Credits:

Shout out to @Ankit152 for the IMDB Dataset. Check out the dataset by clicking the link below. 😄

Dataset link

IMDB-Sentiment-Analysis

About

Trained over 60,000 IMDB rating to categorize positive and negative review

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages