Skip to content

Zhang-Nian/Whisper_Irish

Repository files navigation

Whisper

[Blog] [Paper] [Model card] [Colab example]

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

Approach

Approach

A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. All of these tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing for a single model to replace many different stages of a traditional speech processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.

Setup

We used Python 3.8 and PyTorch (gpu) to finetune models

pip install datasets
pip install transformers
pip install librosa
pip install evaluate
pip install jiwer
pip install gradio

Finetune

[Key link1] [Key link2] [Key link3]

Ideas of finetune

Step 1: Getting familiar with the flow of the finetune whisper model by using the commonvoice corpus

Step 2: Rewrite the entire finetune module (mainly pre-processing of the Irish corpus and how to put it into pytorch, mainly building Dataset subclasses)

Step 3: Build a mini dataset and divide the train dataset and test dataset to check the correctness of the whole finetune process

Step 4: How to do a finetune training on the tokenizer, mainly because the existing tokenizer in whisper does not support Irish

    [[link1]](https://github.com/facebookresearch/fairseq/tree/main/examples/mbart)

Step 5: How to find the most suitable training parameters for the Irish speech

Step 6: The order of finetune, which parameter should be adjusted first, can it be adjusted separately?

Step 7: Consider optimizing Adam or Adafactor in training

Step 8: Use Data Enhancement to Boost Results

Step 9: Analyze the sentences that identify errors and count the different types of errors before considering other options to improve the model

About

Finetune model by using Irish corpus

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages