[TTS]文本中的连续点号会导致bug #3339

arhcer · 2023-06-14T06:20:55Z

我用的是r1.4分支。
当我输入文本 “一边...一边...”写出脱离险境的劳累时，会报错，提示tone_sandhi.py第89行报错。
排查发现，问题在于zh_frontend.py第235行有问题，更深一步的原因在于seg_cut = psg.lcut(seg)，而pinyins = self.g2pW_model(seg)[0]，pinyins中的每个元素即可能是一个字的拼音，也可能是连续标点，例如'...'，然后后续会报错。
解决办法是更改zh_frontend.py中的232行至235行，改成如下：

for i, (word, pos) in enumerate(seg_cut):
    sub_initials = []
    sub_finals = []
    if not '\u4e00' <= word[0] <= '\u9fff':
        if i > 0 and not '\u4e00' <= seg_cut[i-1][0][0] <= '\u9fff':
            continue
        else:
            now_word_length = pre_word_length + 1
    else:
        now_word_length = pre_word_length + len(word)

The text was updated successfully, but these errors were encountered:

arhcer · 2023-06-14T10:38:59Z

zh_frontend.py第225行char_split参数改为True也可以解决这个问题。不需要其余更改代码。

arhcer · 2023-06-15T02:21:28Z

onnx_api.py第215行附近需要保证len(sent) == len(pypinyin_result)
需要添加的代码为：

tmp = []
for x in pypinyin_result:
    if x[0].isalnum():
        tmp.append(x)
    else:
        tmp.extend(list(x[0]))
    pypinyin_result = tmp
assert len(sent) == len(pypinyin_result)

arhcer added Bug T2S labels Jun 14, 2023

kk-2000 mentioned this issue May 23, 2024

【问题解决】解决PaddleSpeech历史问题和BUG #3771

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TTS]文本中的连续点号会导致bug #3339

[TTS]文本中的连续点号会导致bug #3339

arhcer commented Jun 14, 2023

arhcer commented Jun 14, 2023

arhcer commented Jun 15, 2023

[TTS]文本中的连续点号会导致bug #3339

[TTS]文本中的连续点号会导致bug #3339

Comments

arhcer commented Jun 14, 2023

arhcer commented Jun 14, 2023

arhcer commented Jun 15, 2023