Skip to content

This project focuses on detecting fake content generated by language models. The domain for this project is scientist biographies, where real biographies were sourced from Wikipedia, and fake ones were generated using a fine-tuned GPT-2 model.

Notifications You must be signed in to change notification settings

aanchal898/Fake-Scientists

Repository files navigation

Fake-Content-Detection

This project focuses on detecting fake content generated by language models. The domain for this project is scientist biographies, where real biographies were sourced from Wikipedia, and fake ones were generated using a fine-tuned GPT-2 model. The project involved implementing and evaluating three different model architectures: Feed-Forward Neural Network (FFNN), Long Short-Term Memory (LSTM) network, and a Transformer-based model (BERT).

Components:

Feed-Forward Neural Network (FFNN):

Implemented a basic FFNN for binary classification of real and fake biographies. Trained the model on labeled data and evaluated its performance with a confusion matrix. Learning curves for training and test perplexity were generated.

Long Short-Term Memory (LSTM):

Developed an LSTM network using PyTorch's LSTM module. Trained the model on the provided datasets and evaluated its performance. Presented learning curves and a confusion matrix for the test set.

Transformer-based Model (BERT):

Fine-tuned a pre-trained BERT model for the binary classification task. Evaluated the model's performance and discussed its advantages and limitations. Achieved the highest accuracy among the three models.

Results:

  • The FFNN achieved an accuracy of 56.6%.
  • The LSTM model improved the accuracy to 79.4%.
  • The BERT-based Transformer model achieved the highest accuracy of 82.8%.

Files Included:

  • Code: Python scripts for FFNN, LSTM, and Transformer models.
  • Models: Saved model parameters and checkpoints.
  • Results: Confusion matrices, learning curves, and performance metrics. This project demonstrates the application of different neural network architectures in the task of fake content detection, highlighting the strengths and weaknesses of each approach. The code and results can be found in the repository.

For more details or instructions, go to the pdf file.

About

This project focuses on detecting fake content generated by language models. The domain for this project is scientist biographies, where real biographies were sourced from Wikipedia, and fake ones were generated using a fine-tuned GPT-2 model.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages