Skip to content

Latest commit

 

History

History
129 lines (93 loc) · 5.66 KB

README_EN.md

File metadata and controls

129 lines (93 loc) · 5.66 KB

Pretrained BigBird Model for Korean

What is BigBirdHow to UsePretrainingEvaluation ResultDocsCitation

한국어 | English

Apache 2.0 Issues linter DOI

What is BigBird?

BigBird: Transformers for Longer Sequences is a sparse-attention based model that can handle longer sequences than a normal BERT.

🦅 Longer Sequence - Handles up to 4096 tokens, 8 times the BERT, which can handle up to 512 tokens

⏱️ Computational Efficiency - Improved from O(n2) to O(n) using Sparse Attention instead of Full Attention

How to Use

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("monologg/kobigbird-bert-base")  # BigBirdModel
tokenizer = AutoTokenizer.from_pretrained("monologg/kobigbird-bert-base")  # BertTokenizer

Pretraining

For more information, see [Pretraining BigBird]

Hardware Max len LR Batch Train Step Warmup Step
KoBigBird-BERT-Base TPU v3-8 4096 1e-4 32 2M 20k
  • Trained with various data such as Everyone's Corpus, Korean Wiki, Common Crawl, and news data
  • Use ITC (Internal Transformer Construction) model for pretraining. (ITC vs ETC)

Evaluation Result

1. Short Sequence (<=512)

For more information, see [Finetune on Short Sequence Dataset]

NSMC
(acc)
KLUE-NLI
(acc)
KLUE-STS
(pearsonr)
Korquad 1.0
(em/f1)
KLUE MRC
(em/rouge-w)
KoELECTRA-Base-v3 91.13 86.87 93.14 85.66 / 93.94 59.54 / 65.64
KLUE-RoBERTa-Base 91.16 86.30 92.91 85.35 / 94.53 69.56 / 74.64
KoBigBird-BERT-Base 91.18 87.17 92.61 87.08 / 94.71 70.33 / 75.34

2. Long Sequence (>=1024)

For more information, see [Finetune on Long Sequence Dataset]

TyDi QA
(em/f1)
Korquad 2.1
(em/f1)
Fake News
(f1)
Modu Sentiment
(f1-macro)
KLUE-RoBERTa-Base 76.80 / 78.58 55.44 / 73.02 95.20 42.61
KoBigBird-BERT-Base 79.13 / 81.30 67.77 / 82.03 98.85 45.42

Docs

Citation

If you apply KoBigBird to any project and research, please cite our code as below.

@software{jangwon_park_2021_5654154,
  author       = {Jangwon Park and Donggyu Kim},
  title        = {KoBigBird: Pretrained BigBird Model for Korean},
  month        = nov,
  year         = 2021,
  publisher    = {Zenodo},
  version      = {1.0.0},
  doi          = {10.5281/zenodo.5654154},
  url          = {https://doi.org/10.5281/zenodo.5654154}
}

Contributors

Jangwon Park and Donggyu Kim

Acknowledgements

KoBigBird is built with Cloud TPU support from the Tensorflow Research Cloud (TFRC) program.

Also, thanks to Seyun Ahn for a nice logo:)