Skip to content

Trained over 2,000 BBC News to categorize unseen articles into 5 categories namely Sport, Tech, Business, Entertainment and Politics.

Notifications You must be signed in to change notification settings

safwanshamsir99/BBC-News-LSTM-NLP

Repository files navigation

python spyder GitHub NumPy Pandas scikit-learn tf

model_loss

Predictive classification model using Deep Learning Model of Bi-LSTM of Natural Language Processing (NLP) for BBC News Articles.

Trained over 2,000 BBC News to categorize unseen articles into 5 categories namely Sport, Tech, Business, Entertainment and Politics.

Description

  1. The project's objective is to categorize the BBC News articles into 5 categories; Sport, Tech, Business, Entertainment and Politics.
  2. The articles contain enormous amount of data, which can be used to categorize the type of the articles.
  3. The dataset contains anomalies such as single letter (removed using RegEx), numbers, and 99 duplicates data.
  4. The method used for the deep learning model are word embedding, LSTM and Bidirectional.
  5. Several method can be used to improve the model such as lemmatization, stemming, CNN, n-grams, etc.

Deep learning model images

model_architecture

Results

Training loss & Validation loss:

model_loss

Training accuracy & Validation accuracy:

model_accuracy

Model score:

model_score

Discussion

  1. The model achieved 92.8% accuracy score during model evaluation process.
  2. Recall and f1-score also reported a high percentage in range of 0.85 to 0.97 and 0.88 to 0.96 respectively.
  3. However, the model started to overfit after 2nd epochs based on the graph displayed on the Tensorboard.
  4. To solve this problem, early stopping can be introduced to prevent overfitting and increasing the dropout data also can control the model from overfitting. tensorboard

Credits:

The source of the dataset is obtained from a github user @susanli2016. Check out the dataset by clicking the link below. 😄

Dataset link

BBC News Text

About

Trained over 2,000 BBC News to categorize unseen articles into 5 categories namely Sport, Tech, Business, Entertainment and Politics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages