Colloquial Finnish wav2vec2

Scripts for training colloquial Finnish wav2vec 2.0 models

Pre-trained and fine-tuned models

Model	Labeled Data, h	DEV WER, %	TEST WER, %
Wav2Vec 2.0 Base VP-Finnish	N/A	N/A	N/A
Wav2Vec 2.0 Base VP-Finnish	100	29.35	31.90
Wav2Vec 2.0 Base VP-Finnish	1500	22.18	24.43
Wav2Vec 2.0 Base LP (PT from scratch)	N/A	N/A	N/A
Wav2Vec 2.0 Base LP (PT from scratch)	100	26.40	28.92
Wav2Vec 2.0 Base LP (PT from scratch)	1500	21.61	24.35
Wav2Vec 2.0 Base LP (continued PT)	N/A	N/A	N/A
Wav2Vec 2.0 Base LP (continued PT)	100	22.49	24.95
Wav2Vec 2.0 Base LP (continued PT)	1500	17.38	19.65
Wav2Vec 2.0 Large VP-Uralic	N/A	N/A	N/A
Wav2Vec 2.0 Large VP-Uralic	100	21.02	22.98
Wav2Vec 2.0 Large VP-Uralic	1500	19.14	20.49
Wav2Vec 2.0 Large LP (PT from scratch)	N/A	N/A	N/A
Wav2Vec 2.0 Large LP (PT from scratch)	100	21.66	23.85
Wav2Vec 2.0 Large LP (PT from scratch)	1500	17.54	19.26
Wav2Vec 2.0 Large LP (continued PT)	N/A	N/A	N/A
Wav2Vec 2.0 Large LP (continued PT)	100	22.49	24.95
Wav2Vec 2.0 Large LP (continued PT)	1500	16.24	18.04

More details on the models are available in the paper. The models are also available at Huggingface Hub

Pre-training the models

The scripts shared in this repository are adapted to the AMD hardware of the LUMI supercomputer. To train a wav2vec 2.0 Base model, run

sbatch /scripts/pretraining/fairseq_train_multinode_w2v2_B_512gpus.sh

Note: you can simulate 512 GPUs by using k GPUs and adding command line parameters (before --config-dir) distributed_training.distributed_world_size=k +optimization.update_freq='[x]' where x = 512/k

Fine-tuning the models with CTC

To fine-tune a wav2vec 2.0 Base model using Fairseq, run

sbatch scripts/finetuning/fairseq_finetune_multinode_w2v2_B_128gpus_full.sh

Note: you can simulate 128 GPUs by using k GPUs and adding command line parameters (before --config-dir) distributed_training.distributed_world_size=k +optimization.update_freq='[x]' where x = 128/k

Fine-tuning the models with CTC using 🤗Transformers

To fine-tune a wav2vec 2.0 Base model using Huggingface Transformers, run

sbatch scripts/finetuning/huggingface_finetune_multinode_w2v2_B_8gpus_full.sh

Citation

If you use our models or scripts, please cite our article as:

@inproceedings{getman24_interspeech,
  title     = {What happens in continued pre-training? Analysis of self-supervised speech
 models with continued pre-training for colloquial Finnish ASR},
  author    = {Yaroslav Getman and Tamas Grosz and Mikko Kurimo},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {5043--5047},
  doi       = {10.21437/Interspeech.2024-476},
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
config		config
figures		figures
scripts		scripts
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Colloquial Finnish wav2vec2

Pre-trained and fine-tuned models

Pre-training the models

Fine-tuning the models with CTC

Fine-tuning the models with CTC using 🤗Transformers

Citation

About

Releases

Packages

Languages

aalto-speech/colloquial-Finnish-wav2vec2

Folders and files

Latest commit

History

Repository files navigation

Colloquial Finnish wav2vec2

Pre-trained and fine-tuned models

Pre-training the models

Fine-tuning the models with CTC

Fine-tuning the models with CTC using 🤗Transformers

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages