Skip to content

Pytorch BERT implementation for additional experiment

Notifications You must be signed in to change notification settings

jhnlee/pytorch-bert-korean

Repository files navigation

Pytorch BERT Pretrain / Finetuning

pytorch BERT Trainer using HuggingFace transformers

Requirements

  • python 3.6
  • pytorch 1.12
  • cuda 10.0
  • tensorflow 1.14 (for tensorboard)
  • pytorch_transformers
  • gluonnlp >= 0.6.0
  • apex (for mixed precision training)
  • flask (for using api)

Pretrained Korean Bert Model (ETRI or SKT)
Make directory pretrained_model and make sub directory like below

pretrained_model
├── etri
│   ├── bert_config.json
│   ├── pytorch_model.bin
│   ├── tokenization.py
│   └── vocab.korean.rawtext.list
└── skt
    ├── bert_config.json
    ├── pytorch_model.bin
    ├── tokenizer.model
    └── vocab.json

Datasets

Datasets should be in csv format which has two columns named 'Sentence' and 'Emotion'.
Or you can modify a few codes below in datasets.py to fit your own datasets

...
# line 50 - 58
def get_data(self, file_path):
    data = pd.read_csv(file_path)
    corpus = data['Sentence']
    label = None
    try:
        label = [self.label2idx[l] for l in data['Emotion']]
    except:
        pass
    return corpus, label
...

Usage

For maksed language model pretrain

$ python train_mlm.py\
        --pretrained_type="etri"

For text classification

$ python train_classification.py\
        --pretrained_type="etri"

Classification after further MLM pretrain

$ python train_classification.py\
        --pretrained_model_path=".../best_model.bin"

Use fp16 argument for mixed precision training

$ python train_classification.py\
        --fp16\
        --fp16_opt_level="O1"

Inference

$ python test.py\
    --pretrained_model_path="./data/korean_single_test.csv" 

After inference, result file saved to /result folder.

  • /result/test_result.csv : predicted label for test data
  • /result/test_result.png : confusion matrix for test data

Result

Overall

Test Set(3,859)
Accuracy 57.69%
Macro F1 56.84%

F1 score for each Emotion

Emotion F1
공포 60.00%
놀람 57.49%
분노 54.60%
슬픔 62.64%
중립 44.21%
행복 81.88%
혐오 37.04%

Confusion matrix

Simple Web Application with Flask

$ python app.py
Sad case Happy case

About

Pytorch BERT implementation for additional experiment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published