Skip to content

Commit

Permalink
[ASR] fix asr 0-d tensor. (#3214)
Browse files Browse the repository at this point in the history
* fix asr infer.py

* add readme.
  • Loading branch information
zxcd committed May 4, 2023
1 parent 12e3e76 commit caca8e2
Show file tree
Hide file tree
Showing 5 changed files with 5 additions and 3 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

### Recent Update
- ⚡ 2023.04.28: Fix [0-d tensor](https://github.com/PaddlePaddle/PaddleSpeech/pull/3214), with the upgrade of paddlepaddle==2.5, the problem of modifying 0-d tensor has been solved.
- 👑 2023.04.25: Add [AMP for U2 conformer](https://github.com/PaddlePaddle/PaddleSpeech/pull/3167).
- 🔥 2023.04.06: Add [subtitle file (.srt format) generation example](./demos/streaming_asr_server).
- 🔥 2023.03.14: Add SVS(Singing Voice Synthesis) examples with Opencpop dataset, including [DiffSinger](./examples/opencpop/svs1)[PWGAN](./examples/opencpop/voc1) and [HiFiGAN](./examples/opencpop/voc5), the effect is continuously optimized.
Expand Down
1 change: 1 addition & 0 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,7 @@
- 🧩 级联模型应用: 作为传统语音任务的扩展,我们结合了自然语言处理、计算机视觉等任务,实现更接近实际需求的产业级应用。

### 近期更新
- ⚡ 2023.04.28: 修正 [0-d tensor](https://github.com/PaddlePaddle/PaddleSpeech/pull/3214), 配合PaddlePaddle2.5升级修改了0-d tensor的问题。
- 👑 2023.04.25: 新增 [U2 conformer 的 AMP 训练](https://github.com/PaddlePaddle/PaddleSpeech/pull/3167).
- 👑 2023.04.06: 新增 [srt格式字幕生成功能](./demos/streaming_asr_server)
- 🔥 2023.03.14: 新增基于 Opencpop 数据集的 SVS (歌唱合成) 示例,包含 [DiffSinger](./examples/opencpop/svs1)[PWGAN](./examples/opencpop/voc1)[HiFiGAN](./examples/opencpop/voc5),效果持续优化中。
Expand Down
2 changes: 1 addition & 1 deletion paddlespeech/cli/asr/infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,7 @@ def preprocess(self, model_type: str, input: Union[str, os.PathLike]):
# fbank
audio = preprocessing(audio, **preprocess_args)

audio_len = paddle.to_tensor(audio.shape[0])
audio_len = paddle.to_tensor(audio.shape[0]).unsqueeze(axis=0)
audio = paddle.to_tensor(audio, dtype='float32').unsqueeze(axis=0)

self._inputs["audio"] = audio
Expand Down
2 changes: 1 addition & 1 deletion paddlespeech/cli/ssl/infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ def preprocess(self, input: Union[str, os.PathLike]):
# fbank
audio = preprocessing(audio, **preprocess_args)

audio_len = paddle.to_tensor(audio.shape[0])
audio_len = paddle.to_tensor(audio.shape[0]).unsqueeze(axis=0)
audio = paddle.to_tensor(audio, dtype='float32').unsqueeze(axis=0)

self._inputs["audio"] = audio
Expand Down
2 changes: 1 addition & 1 deletion paddlespeech/cli/whisper/infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ def preprocess(self, model_type: str, input: Union[str, os.PathLike]):
# fbank
audio = log_mel_spectrogram(audio, resource_path=self.resource_path)

audio_len = paddle.to_tensor(audio.shape[0])
audio_len = paddle.to_tensor(audio.shape[0]).unsqueeze(axis=0)

self._inputs["audio"] = audio
self._inputs["audio_len"] = audio_len
Expand Down

0 comments on commit caca8e2

Please sign in to comment.