Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick to r1.4 branch #3798

Merged
merged 125 commits into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
125 commits
Select commit Hold shift + click to select a range
044f0c8
[TTS]add Diffsinger with opencpop dataset (#3005)
lym0302 Mar 13, 2023
f057fc0
Update requirements.txt
yt605155624 Mar 13, 2023
4b17e83
fix vits reduce_sum's input/output dtype, test=tts (#3028)
yt605155624 Mar 13, 2023
cb79fb9
[TTS] add opencpop PWGAN example (#3031)
lym0302 Mar 14, 2023
75b1ad5
Update textnorm_test_cases.txt
yt605155624 Mar 14, 2023
1ccb3ce
[TTS] add opencpop HIFIGAN example (#3038)
lym0302 Mar 14, 2023
ba47b52
fix dtype diff of last expand_v2 op of VITS (#3041)
yt605155624 Mar 14, 2023
632fbad
[ASR]add squeezeformer model (#2755)
yeyupiaoling Mar 15, 2023
0c2d366
Update README.md
yt605155624 Mar 15, 2023
7914fa2
Update README_cn.md
yt605155624 Mar 15, 2023
b3fe0d4
Update README.md
yt605155624 Mar 15, 2023
dcfd07b
Update README_cn.md
yt605155624 Mar 15, 2023
2844112
Update README.md
yt605155624 Mar 15, 2023
0e820fc
fix input dtype of elementwise_mul op from bool to int64 (#3054)
yt605155624 Mar 16, 2023
a7a556a
[TTS] add svs frontend (#3062)
lym0302 Mar 21, 2023
cca95e1
[TTS]clean starganv2 vc model code and add docstring (#2987)
yt605155624 Mar 21, 2023
5b875fe
[Doc] change define asr server config to chunk asr config, test=doc (…
zxcd Mar 21, 2023
609e537
get music score, test=doc (#3070)
lym0302 Mar 21, 2023
db85b1e
[TTS]fix elementwise_floordiv's fill_constant (#3075)
yt605155624 Mar 22, 2023
195f29b
fix paddle2onnx's install version, install the newest paddle2onnx in …
yt605155624 Mar 23, 2023
7e8793f
[TTS] update svs_music_score.md (#3085)
lym0302 Mar 24, 2023
acb76a4
rm unused dep, test=tts (#3097)
lym0302 Mar 27, 2023
7b98930
Update bug-report-tts.md (#3120)
yt605155624 Mar 31, 2023
a26ca1f
[TTS]Fix VITS lite infer (#3098)
yt605155624 Apr 6, 2023
ecd7b87
[TTS]add starganv2 vc trainer (#3143)
yt605155624 Apr 10, 2023
bf3e659
[TTS]【Hackathon + No.190】 + 模型复现:iSTFTNet (#3006)
longRookie Apr 10, 2023
c1a1528
add function for generating srt file (#3123)
twoDogy Apr 11, 2023
5657d3e
fix example/aishell local/train.sh if condition bug, test=asr (#3146)
lemondy Apr 11, 2023
894a332
fix some preprocess bugs (#3155)
yt605155624 Apr 13, 2023
fdf53f1
add amp for U2 conformer.
zxcd Apr 17, 2023
6056f45
fix scaler save
zxcd Apr 17, 2023
20f5cfe
fix scaler save and load.
zxcd Apr 17, 2023
92f4213
mv scaler.unscale_ blow grad_clip.
zxcd Apr 17, 2023
97ca0da
[TTS]add StarGANv2VC preprocess (#3163)
yt605155624 Apr 18, 2023
c6c9ba5
[TTS] [黑客松]Add JETS (#3109)
ljhzxc Apr 19, 2023
cf8727b
Update quick_start.md (#3175)
46319943 Apr 20, 2023
6e63998
[BUG] Fix progress bar unit. (#3177)
46319943 Apr 20, 2023
6bd7d14
Update quick_start_cn.md (#3176)
46319943 Apr 20, 2023
135d19e
[TTS]StarGANv2 VC fix some trainer bugs, add add reset_parameters (#3…
yt605155624 Apr 20, 2023
3a7ec9c
VITS learning rate revised, test=tts
WongLaw Apr 20, 2023
3ff22b7
VITS learning rate revised, test=tts
WongLaw Apr 20, 2023
aff54e0
[s2t] mv dataset into paddlespeech.dataset (#3183)
zh794390558 Apr 21, 2023
2591f17
Fix some typos. (#3178)
Yulv-git Apr 21, 2023
a8a2acd
[s2t] move s2t data preprocess into paddlespeech.dataset (#3189)
zh794390558 Apr 23, 2023
d911d5a
Update pretrained model in README (#3193)
ljhzxc Apr 23, 2023
14d4c89
[TTS]Fix losses of StarGAN v2 VC (#3184)
yt605155624 Apr 24, 2023
e41ef51
VITS learning rate revised, test=tts
WongLaw Apr 24, 2023
64763f7
VITS learning rate revised, test=tts
WongLaw Apr 24, 2023
d98604e
add new aishell model for better CER.
zxcd Apr 25, 2023
193d41f
add readme
zxcd Apr 25, 2023
ca18545
[s2t] fix cli args to config (#3194)
zh794390558 Apr 25, 2023
aed1336
Update README.md
zh794390558 Apr 25, 2023
556d775
[ASR] Support Hubert, fintuned on the librispeech dataset (#3088)
Zth9730 May 4, 2023
ee4e9ff
[ASR] fix asr 0-d tensor. (#3214)
zxcd May 4, 2023
f5eb740
Update README.md
zh794390558 May 4, 2023
8db1574
Update README.md
zh794390558 May 4, 2023
ca5341f
fix: 🐛 修复服务端 python ASREngine 无法使用conformer_talcs模型 (#3230)
Gsonovb May 15, 2023
5351a73
Adding WavLM implementation
jiamingkong May 15, 2023
625fbc1
fix model m5s
zxcd May 22, 2023
7a8528f
Code clean up according to comments in https://github.com/PaddlePaddl…
jiamingkong May 22, 2023
d588892
fix error in tts/st
zoooo0820 May 22, 2023
71538c9
Changed the path for the uploaded weight
jiamingkong May 22, 2023
f737258
Update phonecode.py
shuishu May 24, 2023
3e9f141
Adapted wavlmASR model to pretrained weights and CLI
jiamingkong May 24, 2023
f6b67fc
Changed the MD5 of the pretrained tar file due to bug fixes
jiamingkong May 25, 2023
bc3ff9f
Deleted examples/librispeech/asr5/format_rsl.py
jiamingkong May 25, 2023
57b24b2
Update released_model.md
zh794390558 May 29, 2023
e5be4ec
Code clean up for CIs
jiamingkong May 30, 2023
c9dd803
Fixed the transpose usages ignored before
jiamingkong May 30, 2023
1af2e1a
Update setup.py
zh794390558 May 31, 2023
ffc3a1b
refactor mfa scripts
zh794390558 May 31, 2023
48fc7b7
Final cleaning; Modified SSL/infer.py and README for wavlm inclusion …
jiamingkong May 31, 2023
65b116d
updating readme and readme_cn
jiamingkong May 31, 2023
dc0b0ae
remove tsinghua pypi
fightfat Jun 1, 2023
e2e5c61
Update setup.py (#3294)
zh794390558 Jun 1, 2023
9ce84d7
Update setup.py
zh794390558 Jun 1, 2023
88fe814
refactor rhy
zh794390558 Jun 1, 2023
c27f113
fix ckpt
zh794390558 Jun 1, 2023
032ce7c
add dtype param for arange API. (#3302)
zxcd Jun 2, 2023
27b66c5
add scripts for tts code switch
zh794390558 Jun 2, 2023
27bb521
add t2s assets
zh794390558 Jun 2, 2023
3d1e94f
more comment on tts frontend
zh794390558 Jun 7, 2023
ca86799
fix librosa==0.8.1 numpy==1.23.5 for paddleaudio align with this version
zh794390558 Jun 7, 2023
5dc346c
move ssl into t2s.frontend; fix spk_id for 0-D tensor;
zh794390558 Jun 7, 2023
50bc297
add ssml unit test
zh794390558 Jun 7, 2023
a4a9961
add en_frontend file
zh794390558 Jun 7, 2023
20c6736
add mix frontend test
zh794390558 Jun 8, 2023
e687648
fix long text oom using ssml; filter comma; update polyphonic
zh794390558 Jun 8, 2023
ce43e06
remove print
zh794390558 Jun 8, 2023
4beadc5
hotfix english G2P
zh794390558 Jun 8, 2023
4eafbf4
en frontend unit text
zh794390558 Jun 8, 2023
7acf073
fix profiler (#3323)
mmglove Jun 12, 2023
9fbaebd
old grad clip has 0d tensor problem, fix it (#3334)
zh794390558 Jun 13, 2023
a10ee04
update to py3.8
zh794390558 Jun 13, 2023
461394e
remove fluid.
zxcd Jun 28, 2023
329264f
add roformer
zh794390558 Jul 12, 2023
1bcca9f
fix bugs
zh794390558 Jul 12, 2023
7af1550
add roformer result
zh794390558 Jul 12, 2023
9aabb76
support position interpolation for langer attention context windown l…
zh794390558 Jul 13, 2023
85e087d
RoPE with position interpolation
zh794390558 Jul 14, 2023
fd74313
rope for streaming decoding
zh794390558 Jul 14, 2023
19a45a9
update result
zh794390558 Jul 17, 2023
76f276b
fix rotary embeding
zh794390558 Jul 17, 2023
0ac4b16
Update README.md
zh794390558 Jul 21, 2023
50204ec
fix weight decay
zh794390558 Jul 25, 2023
a299420
fix develop view confict with model's
wanghuancoder Jul 28, 2023
1c2a538
Add XPU support for SpeedySpeech (#3502)
USTCKAY Sep 6, 2023
43ffb6f
Add XPU support for FastSpeech2 (#3514)
USTCKAY Sep 12, 2023
e7c632f
Update ge2e_clone.py (#3517)
skyboooox Sep 19, 2023
7994125
Fix Readme. (#3527)
zxcd Sep 19, 2023
7922115
FIX: Added missing imports
fazledyn-or Oct 3, 2023
c4fe47b
FIX: Fixed the implementation of a special method
fazledyn-or Oct 3, 2023
44f1626
【benchmark】add max_mem_reserved for benchmark (#3604)
mmglove Nov 22, 2023
e2d1e9d
fix develop bug function:view to reshape (#3633)
luyao-cv Dec 4, 2023
6254a9b
【benchmark】fix gpu_mem unit (#3634)
mmglove Dec 5, 2023
cb6fb7f
增加文件编码读取 (#3606)
Coloryr Jan 16, 2024
30b2d98
bugfix: audio_len should be 1D, no 0D, which will raise list index ou…
JeffLu Feb 26, 2024
14796bb
Update README.md (#3532)
satani99 Feb 26, 2024
11decf9
fixed version for paddlepaddle. (#3701)
zxcd May 23, 2024
27b874f
【Fix Speech Issue No.5】issue 3444 transformation import error (#3779)
kk-2000 Jun 4, 2024
c3a439d
【Fix Speech Issue No.8】issue 3652 merge_yi function has a bug (#3786)
mattheliu Jun 5, 2024
0505b9a
【test】add cli test readme (#3784)
zxcd Jun 5, 2024
f4cb31b
【test】fix test cli bug (#3793)
zxcd Jun 6, 2024
2b7334e
Update setup.py (#3795)
zxcd Jun 7, 2024
bfeb8a0
adapt view behavior change, fix KeyError. (#3794)
zxcd Jun 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,4 @@ git commit -m "xxxxxx, test=doc"
1. 虽然跳过了 CI,但是还要先排队排到才能跳过,所以非自己方向看到 pending 不要着急 🤣
2. 在 `git commit --amend` 的时候才加 `test=xxx` 可能不太有效
3. 一个 pr 多次提交 commit 注意每次都要加 `test=xxx`,因为每个 commit 都会触发 CI
4. 删除 python 环境中已经安装好的的 paddlespeech,否则可能会影响 import paddlespeech 的顺序</div>
4. 删除 python 环境中已经安装好的 paddlespeech,否则可能会影响 import paddlespeech 的顺序</div>
1 change: 0 additions & 1 deletion .github/ISSUE_TEMPLATE/bug-report-tts.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ name: "\U0001F41B TTS Bug Report"
about: Create a report to help us improve
title: "[TTS]XXXX"
labels: Bug, T2S
assignees: yt605155624

---

Expand Down
61 changes: 36 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,13 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

### Recent Update
- 👑 2023.05.31: Add [WavLM ASR-en](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/librispeech/asr5), WavLM fine-tuning for ASR on LibriSpeech.
- 👑 2023.05.04: Add [HuBERT ASR-en](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/librispeech/asr4), HuBERT fine-tuning for ASR on LibriSpeech.
- ⚡ 2023.04.28: Fix [0-d tensor](https://github.com/PaddlePaddle/PaddleSpeech/pull/3214), with the upgrade of paddlepaddle==2.5, the problem of modifying 0-d tensor has been solved.
- 👑 2023.04.25: Add [AMP for U2 conformer](https://github.com/PaddlePaddle/PaddleSpeech/pull/3167).
- 🔥 2023.04.06: Add [subtitle file (.srt format) generation example](./demos/streaming_asr_server).
- 👑 2023.04.25: Add [AMP for U2 conformer](https://github.com/PaddlePaddle/PaddleSpeech/pull/3167).
- 🔥 2023.03.14: Add SVS(Singing Voice Synthesis) examples with Opencpop dataset, including [DiffSinger](./examples/opencpop/svs1)、[PWGAN](./examples/opencpop/voc1) and [HiFiGAN](./examples/opencpop/voc5), the effect is continuously optimized.
- 👑 2023.03.09: Add [Wav2vec2ASR-zh](./examples/aishell/asr3).
- 🎉 2023.03.07: Add [TTS ARM Linux C++ Demo](./demos/TTSArmLinux).
- 🔥 2023.03.03 Add Voice Conversion [StarGANv2-VC synthesize pipeline](./examples/vctk/vc3).
Expand Down Expand Up @@ -221,13 +228,13 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision

## Installation

We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7* and *paddlepaddle>=2.4.1*.
We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.8* and *paddlepaddle<=2.5.1*. Some new versions of Paddle do not have support for adaptation in PaddleSpeech, so currently only versions 2.5.1 and earlier can be supported.

### **Dependency Introduction**

+ gcc >= 4.8.5
+ paddlepaddle >= 2.4.1
+ python >= 3.7
+ paddlepaddle <= 2.5.1
+ python >= 3.8
+ OS support: Linux(recommend), Windows, Mac OSX

PaddleSpeech depends on paddlepaddle. For installation, please refer to the official website of [paddlepaddle](https://www.paddlepaddle.org.cn/en) and choose according to your own machine. Here is an example of the cpu version.
Expand Down Expand Up @@ -577,14 +584,14 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
</thead>
<tbody>
<tr>
<td> Text Frontend </td>
<td colspan="2"> &emsp; </td>
<td>
<a href = "./examples/other/tn">tn</a> / <a href = "./examples/other/g2p">g2p</a>
</td>
<td> Text Frontend </td>
<td colspan="2"> &emsp; </td>
<td>
<a href = "./examples/other/tn">tn</a> / <a href = "./examples/other/g2p">g2p</a>
</td>
</tr>
<tr>
<td rowspan="5">Acoustic Model</td>
<td rowspan="6">Acoustic Model</td>
<td>Tacotron2</td>
<td>LJSpeech / CSMSC</td>
<td>
Expand Down Expand Up @@ -619,6 +626,13 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
<a href = "./examples/vctk/ernie_sat">ERNIE-SAT-vctk</a> / <a href = "./examples/aishell3/ernie_sat">ERNIE-SAT-aishell3</a> / <a href = "./examples/aishell3_vctk/ernie_sat">ERNIE-SAT-zh_en</a>
</td>
</tr>
<tr>
<td>DiffSinger</td>
<td>Opencpop</td>
<td>
<a href = "./examples/opencpop/svs1">DiffSinger-opencpop</a>
</td>
</tr>
<tr>
<td rowspan="6">Vocoder</td>
<td >WaveFlow</td>
Expand All @@ -629,9 +643,9 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
</tr>
<tr>
<td >Parallel WaveGAN</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3 / Opencpop</td>
<td>
<a href = "./examples/ljspeech/voc1">PWGAN-ljspeech</a> / <a href = "./examples/vctk/voc1">PWGAN-vctk</a> / <a href = "./examples/csmsc/voc1">PWGAN-csmsc</a> / <a href = "./examples/aishell3/voc1">PWGAN-aishell3</a>
<a href = "./examples/ljspeech/voc1">PWGAN-ljspeech</a> / <a href = "./examples/vctk/voc1">PWGAN-vctk</a> / <a href = "./examples/csmsc/voc1">PWGAN-csmsc</a> / <a href = "./examples/aishell3/voc1">PWGAN-aishell3</a> / <a href = "./examples/opencpop/voc1">PWGAN-opencpop</a>
</td>
</tr>
<tr>
Expand All @@ -650,9 +664,9 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
</tr>
<tr>
<td>HiFiGAN</td>
<td>LJSpeech / VCTK / CSMSC / AISHELL-3</td>
<td>LJSpeech / VCTK / CSMSC / AISHELL-3 / Opencpop</td>
<td>
<a href = "./examples/ljspeech/voc5">HiFiGAN-ljspeech</a> / <a href = "./examples/vctk/voc5">HiFiGAN-vctk</a> / <a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a> / <a href = "./examples/aishell3/voc5">HiFiGAN-aishell3</a>
<a href = "./examples/ljspeech/voc5">HiFiGAN-ljspeech</a> / <a href = "./examples/vctk/voc5">HiFiGAN-vctk</a> / <a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a> / <a href = "./examples/aishell3/voc5">HiFiGAN-aishell3</a> / <a href = "./examples/opencpop/voc5">HiFiGAN-opencpop</a>
</td>
</tr>
<tr>
Expand Down Expand Up @@ -880,15 +894,20 @@ The Text-to-Speech module is originally called [Parakeet](https://github.com/Pad

- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): Use PaddleSpeech TTS and ASR to clone voice from videos.**

<div align="center">
<img src="https://raw.githubusercontent.com/jerryuhoo/VTuberTalk/main/gui/gui.png" width = "500px" />
</div>


## Citation

To cite PaddleSpeech for research, please use the following format.

```text
@inproceedings{zhang2022paddlespeech,
title = {PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit},
author = {Hui Zhang, Tian Yuan, Junkun Chen, Xintong Li, Renjie Zheng, Yuxin Huang, Xiaojie Chen, Enlei Gong, Zeyu Chen, Xiaoguang Hu, dianhai yu, Yanjun Ma, Liang Huang},
booktitle = {Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations},
year = {2022},
publisher = {Association for Computational Linguistics},
}

@InProceedings{pmlr-v162-bai22d,
title = {{A}$^3${T}: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing},
author = {Bai, He and Zheng, Renjie and Chen, Junkun and Ma, Mingbo and Li, Xintong and Huang, Liang},
Expand All @@ -903,14 +922,6 @@ To cite PaddleSpeech for research, please use the following format.
url = {https://proceedings.mlr.press/v162/bai22d.html},
}

@inproceedings{zhang2022paddlespeech,
title = {PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit},
author = {Hui Zhang, Tian Yuan, Junkun Chen, Xintong Li, Renjie Zheng, Yuxin Huang, Xiaojie Chen, Enlei Gong, Zeyu Chen, Xiaoguang Hu, dianhai yu, Yanjun Ma, Liang Huang},
booktitle = {Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations},
year = {2022},
publisher = {Association for Computational Linguistics},
}

@inproceedings{zheng2021fused,
title={Fused acoustic and text encoding for multimodal bilingual pretraining and speech translation},
author={Zheng, Renjie and Chen, Junkun and Ma, Mingbo and Huang, Liang},
Expand Down
55 changes: 35 additions & 20 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-red.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleSpeech?color=ffa"></a>
<a href="support os"><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.8+-aff.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleSpeech?color=9ea"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/PaddleSpeech?color=3af"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleSpeech?color=9cc"></a>
Expand Down Expand Up @@ -183,6 +183,13 @@
- 🧩 级联模型应用: 作为传统语音任务的扩展,我们结合了自然语言处理、计算机视觉等任务,实现更接近实际需求的产业级应用。

### 近期更新
- 👑 2023.05.31: 新增 [WavLM ASR-en](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/librispeech/asr5), 基于WavLM的英语识别微调,使用LibriSpeech数据集
- 👑 2023.05.04: 新增 [HuBERT ASR-en](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/librispeech/asr4), 基于HuBERT的英语识别微调,使用LibriSpeech数据集
- ⚡ 2023.04.28: 修正 [0-d tensor](https://github.com/PaddlePaddle/PaddleSpeech/pull/3214), 配合PaddlePaddle2.5升级修改了0-d tensor的问题。
- 👑 2023.04.25: 新增 [U2 conformer 的 AMP 训练](https://github.com/PaddlePaddle/PaddleSpeech/pull/3167).
- 👑 2023.04.06: 新增 [srt格式字幕生成功能](./demos/streaming_asr_server)。
- 👑 2023.04.25: 新增 [U2 conformer 的 AMP 训练](https://github.com/PaddlePaddle/PaddleSpeech/pull/3167).
- 🔥 2023.03.14: 新增基于 Opencpop 数据集的 SVS (歌唱合成) 示例,包含 [DiffSinger](./examples/opencpop/svs1)、[PWGAN](./examples/opencpop/voc1) 和 [HiFiGAN](./examples/opencpop/voc5),效果持续优化中。
- 👑 2023.03.09: 新增 [Wav2vec2ASR-zh](./examples/aishell/asr3)。
- 🎉 2023.03.07: 新增 [TTS ARM Linux C++ 部署示例](./demos/TTSArmLinux)。
- 🔥 2023.03.03: 新增声音转换模型 [StarGANv2-VC 合成流程](./examples/vctk/vc3)。
Expand Down Expand Up @@ -231,12 +238,12 @@
<a name="安装"></a>
## 安装

我们强烈建议用户在 **Linux** 环境下,*3.7* 以上版本的 *python* 上安装 PaddleSpeech。
我们强烈建议用户在 **Linux** 环境下,*3.8* 以上版本的 *python* 上安装 PaddleSpeech。同时,有一些Paddle新版本的内容没有在做适配的支持,因此目前只能使用2.5.1及之前的版本

### 相关依赖
+ gcc >= 4.8.5
+ paddlepaddle >= 2.4.1
+ python >= 3.7
+ paddlepaddle <= 2.5.1
+ python >= 3.8
+ linux(推荐), mac, windows

PaddleSpeech 依赖于 paddlepaddle,安装可以参考[ paddlepaddle 官网](https://www.paddlepaddle.org.cn/),根据自己机器的情况进行选择。这里给出 cpu 版本示例,其它版本大家可以根据自己机器的情况进行安装。
Expand Down Expand Up @@ -576,43 +583,50 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
<td>
<a href = "./examples/other/tn">tn</a> / <a href = "./examples/other/g2p">g2p</a>
</td>
</tr>
<tr>
<td rowspan="5">声学模型</td>
</tr>
<tr>
<td rowspan="6">声学模型</td>
<td>Tacotron2</td>
<td>LJSpeech / CSMSC</td>
<td>
<a href = "./examples/ljspeech/tts0">tacotron2-ljspeech</a> / <a href = "./examples/csmsc/tts0">tacotron2-csmsc</a>
</td>
</tr>
<tr>
</tr>
<tr>
<td>Transformer TTS</td>
<td>LJSpeech</td>
<td>
<a href = "./examples/ljspeech/tts1">transformer-ljspeech</a>
</td>
</tr>
<tr>
</tr>
<tr>
<td>SpeedySpeech</td>
<td>CSMSC</td>
<td >
<a href = "./examples/csmsc/tts2">speedyspeech-csmsc</a>
</td>
</tr>
<tr>
</tr>
<tr>
<td>FastSpeech2</td>
<td>LJSpeech / VCTK / CSMSC / AISHELL-3 / ZH_EN / finetune</td>
<td>
<a href = "./examples/ljspeech/tts3">fastspeech2-ljspeech</a> / <a href = "./examples/vctk/tts3">fastspeech2-vctk</a> / <a href = "./examples/csmsc/tts3">fastspeech2-csmsc</a> / <a href = "./examples/aishell3/tts3">fastspeech2-aishell3</a> / <a href = "./examples/zh_en_tts/tts3">fastspeech2-zh_en</a> / <a href = "./examples/other/tts_finetune/tts3">fastspeech2-finetune</a>
</td>
</tr>
<tr>
</tr>
<tr>
<td><a href = "https://arxiv.org/abs/2211.03545">ERNIE-SAT</a></td>
<td>VCTK / AISHELL-3 / ZH_EN</td>
<td>
<a href = "./examples/vctk/ernie_sat">ERNIE-SAT-vctk</a> / <a href = "./examples/aishell3/ernie_sat">ERNIE-SAT-aishell3</a> / <a href = "./examples/aishell3_vctk/ernie_sat">ERNIE-SAT-zh_en</a>
</td>
</tr>
</tr>
<tr>
<td>DiffSinger</td>
<td>Opencpop</td>
<td>
<a href = "./examples/opencpop/svs1">DiffSinger-opencpop</a>
</td>
</tr>
<tr>
<td rowspan="6">声码器</td>
<td >WaveFlow</td>
Expand All @@ -623,9 +637,9 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
</tr>
<tr>
<td >Parallel WaveGAN</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3 / Opencpop</td>
<td>
<a href = "./examples/ljspeech/voc1">PWGAN-ljspeech</a> / <a href = "./examples/vctk/voc1">PWGAN-vctk</a> / <a href = "./examples/csmsc/voc1">PWGAN-csmsc</a> / <a href = "./examples/aishell3/voc1">PWGAN-aishell3</a>
<a href = "./examples/ljspeech/voc1">PWGAN-ljspeech</a> / <a href = "./examples/vctk/voc1">PWGAN-vctk</a> / <a href = "./examples/csmsc/voc1">PWGAN-csmsc</a> / <a href = "./examples/aishell3/voc1">PWGAN-aishell3</a> / <a href = "./examples/opencpop/voc1">PWGAN-opencpop</a>
</td>
</tr>
<tr>
Expand All @@ -644,9 +658,9 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
</tr>
<tr>
<td >HiFiGAN</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3 / Opencpop</td>
<td>
<a href = "./examples/ljspeech/voc5">HiFiGAN-ljspeech</a> / <a href = "./examples/vctk/voc5">HiFiGAN-vctk</a> / <a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a> / <a href = "./examples/aishell3/voc5">HiFiGAN-aishell3</a>
<a href = "./examples/ljspeech/voc5">HiFiGAN-ljspeech</a> / <a href = "./examples/vctk/voc5">HiFiGAN-vctk</a> / <a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a> / <a href = "./examples/aishell3/voc5">HiFiGAN-aishell3</a> / <a href = "./examples/opencpop/voc5">HiFiGAN-opencpop</a>
</td>
</tr>
<tr>
Expand Down Expand Up @@ -703,6 +717,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
</tbody>
</table>


<a name="声音分类模型"></a>
**声音分类**

Expand Down
2 changes: 1 addition & 1 deletion audio/paddleaudio/backends/soundfile_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ def soundfile_save(y: np.ndarray, sr: int, file: os.PathLike) -> None:

if sr <= 0:
raise ParameterError(
f'Sample rate should be larger than 0, recieved sr = {sr}')
f'Sample rate should be larger than 0, received sr = {sr}')

if y.dtype not in ['int16', 'int8']:
warnings.warn(
Expand Down
6 changes: 4 additions & 2 deletions audio/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,14 @@

ROOT_DIR = Path(__file__).parent.resolve()

VERSION = '1.1.0'
VERSION = '1.2.0'
COMMITID = 'none'

base = [
"kaldiio",
# paddleaudio align with librosa==0.8.1, which need numpy==1.23.x
"librosa==0.8.1",
"numpy==1.23.5",
"kaldiio",
"pathos",
"pybind11",
"parameterized",
Expand Down
2 changes: 1 addition & 1 deletion audio/tests/features/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ def initWavInput(self, url=wav_url):
self.waveform, self.sr = load(os.path.abspath(os.path.basename(url)))
self.waveform = self.waveform.astype(
np.float32
) # paddlespeech.s2t.transform.spectrogram only supports float32
) # paddlespeech.audio.transform.spectrogram only supports float32
dim = len(self.waveform.shape)

assert dim in [1, 2]
Expand Down
4 changes: 2 additions & 2 deletions audio/tests/features/test_istft.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@
from paddleaudio.functional.window import get_window

from .base import FeatTest
from paddlespeech.s2t.transform.spectrogram import IStft
from paddlespeech.s2t.transform.spectrogram import Stft
from paddlespeech.audio.transform.spectrogram import IStft
from paddlespeech.audio.transform.spectrogram import Stft


class TestIstft(FeatTest):
Expand Down
2 changes: 1 addition & 1 deletion audio/tests/features/test_log_melspectrogram.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
import paddleaudio

from .base import FeatTest
from paddlespeech.s2t.transform.spectrogram import LogMelSpectrogram
from paddlespeech.audio.transform.spectrogram import LogMelSpectrogram


class TestLogMelSpectrogram(FeatTest):
Expand Down
2 changes: 1 addition & 1 deletion audio/tests/features/test_spectrogram.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
import paddleaudio

from .base import FeatTest
from paddlespeech.s2t.transform.spectrogram import Spectrogram
from paddlespeech.audio.transform.spectrogram import Spectrogram


class TestSpectrogram(FeatTest):
Expand Down
2 changes: 1 addition & 1 deletion audio/tests/features/test_stft.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
from paddleaudio.functional.window import get_window

from .base import FeatTest
from paddlespeech.s2t.transform.spectrogram import Stft
from paddlespeech.audio.transform.spectrogram import Stft


class TestStft(FeatTest):
Expand Down
Loading