[ASR] Support Hubert, fintuned on the librispeech dataset #3088

Zth9730 · 2023-03-24T07:55:05Z

PR types

New features

PR changes

Models

Describe

support ASR Hubert

paddle-bot · 2023-03-24T07:55:10Z

Thanks for your contribution!

mergify · 2023-03-24T07:55:39Z

This pull request is now in conflict :(

zxcd · 2023-03-27T06:57:51Z

paddlespeech/s2t/models/wav2vec2/wav2vec2_ASR.py

@@ -55,6 +58,8 @@ def __init__(self, config: dict):
                           reduction='mean')

    def forward(self, wav, wavs_lens_rate, target, target_lens):
+        # import pdb
+        # pdb.set_trace()


注释可以删一下

zxcd · 2023-03-27T07:03:22Z

examples/librispeech/asr3/run.sh

@@ -19,6 +20,7 @@ audio_file=data/demo_002_en.wav

 avg_ckpt=avg_${avg_num}
 ckpt=$(basename ${conf_path} | awk -F'.' '{print $1}')
+ckpt=test6


zxcd · 2023-03-27T07:05:20Z

dataset/librispeech/librispeech.py

@@ -133,7 +133,7 @@ def create_manifest(data_dir, manifest_path):
 def prepare_dataset(url, md5sum, target_dir, manifest_path):
    """Download, unpack and create summmary manifest file.
    """
-    if not os.path.exists(os.path.join(target_dir, "LibriSpeech")):


这里为什么要变？

原先的代码似乎和librispeech解压出的结果不太一致，本地已有librispeech数据集的情况下不太方便

什么意思？这里不是有的话就不下载了吗？

ok，按照之前的吧

zxcd · 2023-03-27T08:06:11Z

examples/librispeech/asr3/conf/hubertASR.yaml

+
+
+task_cfg:
+  sample_rate: 16000


建议能否添加pretrain/finetune的标签

zxcd · 2023-03-27T08:08:10Z

examples/librispeech/asr3/conf/hubertASR.yaml

+#             Data Augmentation            #
+############################################
+audio_augment:  # for raw audio 
+  sample_rate: 16000


为什么需要两个sample_rate参数

zxcd · 2023-03-27T08:17:23Z

paddlespeech/s2t/models/hubert/modules/hubert_model.py

+        self.mask_emb = paddle.create_parameter(
+            shape=[cfg.encoder_embed_dim],
+            dtype='float32',
+            default_initializer=paddle.nn.initializer.Uniform(),


torch和paddle对于uniform的初始化范围不一致，torch为(0,1)，paddle为(-1,1)，可以确定下是否会对训练产生影响，或者直接加上low和high参数

zxcd · 2023-03-27T08:18:14Z

paddlespeech/s2t/models/hubert/modules/hubert_model.py

+            self.label_embs_concat = paddle.create_parameter(
+            shape=[sum(self.num_classes), final_dim],
+            dtype='float32',
+            default_initializer=paddle.nn.initializer.Uniform(),


zxcd · 2023-03-27T09:33:58Z

paddlespeech/s2t/models/hubert/modules/hubert_model.py

+
+        return x, mask_indices
+
+    def compute_nce(x, pos, negs):


zxcd · 2023-03-27T09:37:21Z

paddlespeech/s2t/models/hubert/hubert_ASR.py

+from dataclasses import dataclass, field, is_dataclass
+from copy import deepcopy
+
+from omegaconf import II, MISSING, open_dict


有用到吗？

zxcd · 2023-03-27T09:38:47Z

paddlespeech/s2t/models/hubert/hubert_ASR.py

+
+
+class HubertBase(nn.Layer):
+    """Wav2vec2 model"""


zxcd · 2023-03-30T08:16:15Z

examples/librispeech/asr3/conf/hubertASR.yaml

+  enc_n_units: 1024
+  blank_id: 0
+  dropout_rate: 0.0
+hubert_params_path: "exp/hubert/pd_hubert.pdparams"


这个模型是否可以给出下载链接？

zh794390558

细节比较多，先review下，后面再细看。

zh794390558 · 2023-04-06T02:45:55Z

examples/librispeech/asr3/conf/hubertASR.yaml

+  fp16: True
+  label_rate: 50
+  extractor_mode: layer_norm
+  encoder_layers: 24


这是Large的配置？配置文件区分下吧

zh794390558 · 2023-04-06T03:32:17Z

examples/librispeech/asr3/path.sh

@@ -10,6 +10,5 @@ export PYTHONPATH=${MAIN_ROOT}:${PYTHONPATH}

 export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib/

-
-MODEL=wav2vec2
+MODEL=$1


这个不需要固定，不能用传参的方式。如果是和wav2vec一个asr目录的话就单开个吧。

zh794390558 · 2023-04-06T03:33:14Z

examples/librispeech/asr3/run.sh

-stage=0
-stop_stage=0
-conf_path=conf/wav2vec2ASR.yaml
+gpus=2


记得够改回默认值。

zh794390558 · 2023-04-06T03:34:45Z

paddlespeech/s2t/exps/hubert/model.py

+logger = Log(__name__).getlog()
+
+
+def clip_grad_norm_(


替换成paddle的API吧。

我看wav2vec2目前也用的这个接口？paddle的对应api是哪个？可以用了吗？

dev和最近的2.5有这个API了。

这里我注释了todo，后面paddle依赖改为2.5后再改这里吧

zh794390558 · 2023-04-11T06:29:43Z

paddlespeech/s2t/models/hubert/modules/hubert_model.py

+        self.feat2tar_ratio = cfg.label_rate * feature_ds_rate / task_cfg.sample_rate
+
+        self.post_extract_proj = (
+            nn.Linear(self.embed, cfg.encoder_embed_dim)


需要替换成align.Linaer，相关的都需要操作下。

zh794390558 · 2023-04-11T09:05:11Z