Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

The codebase generates a synthetic pretraining dataset from an HMM. The downstream task is a linear function of the posterior distribution of the first HMM hidden state in the sequence. We provide code for two experiments:

Head tuning vs. prompt tuning: We generate data from a vanilla HMM and train a Transformer masked language model (MLM). We compare the downstream accuracy of head tuning (linear probing on the CLS token) vs. prompt tuning (head tuning + optimized continuous input vectors prepended to the input). Run all seeds and hidden state sizes using scripts/exp1_local_runner.sh or scripts/exp1_slurm_runner.sh on Slurm.
Memory-augmented HMMs: We consider generating data from a memory-augmented HMM, where the memory structure allows for relaxed conditions for learning useful representations for the downstream task. Run all seeds and memory slot sizes/hidden state sizes using scripts/exp2_local_runner.sh or scripts/exp2_slurm_runner.sh on Slurm.

Setup

To get started, run the following:

virtualenv .env
source .env/bin/activate
pip install -r requirements.txt
pip install -e transformers
cd apex
pip install --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Please see the paper for more details.

@article{wei2021why,
         author = {Colin Wei and Sang Michael Xie and Tengyu Ma},
         journal = {Neural Information Processing Systems (NeurIPS)},
         title = {Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning},
         year = {2021},}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
apex		apex
figures		figures
scripts		scripts
src		src
transformers		transformers
.gitignore		.gitignore
README.md		README.md
consts.sh		consts.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Setup

About

Releases

Packages

Languages

sangmichaelxie/pretraining_analysis

Folders and files

Latest commit

History

Repository files navigation

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages