Training on HuggingFace OpenWebText for GPT2 comparison #23

nathanneuro · 2022-12-19T20:28:51Z

On my fork I'm attempting to add in support for HuggingFace's OpenWebText dataset and their GPT2 tokenizer so that I can do a comparison against HF's GPT2-small. If you were willing, I'd love advice on setting up model params for self-supervised autoregressive NLP. Thanks!

cghawthorne · 2022-12-22T21:21:58Z

Hi @nathanneuro. You can probably use pretty normal settings. In the paper, we used some special hparams for PG-19 and Wikitext-103 to avoid overfitting, but OpenWebText is probably large enough that you don't need to worry about that. For text, rotary position encoding seems to work well. Maybe start with 8192 context and 1024 latents?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on HuggingFace OpenWebText for GPT2 comparison #23

Training on HuggingFace OpenWebText for GPT2 comparison #23

nathanneuro commented Dec 19, 2022

cghawthorne commented Dec 22, 2022

Training on HuggingFace OpenWebText for GPT2 comparison #23

Training on HuggingFace OpenWebText for GPT2 comparison #23

Comments

nathanneuro commented Dec 19, 2022

cghawthorne commented Dec 22, 2022