Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TTS]文本中的连续点号会导致bug #3339

Open
arhcer opened this issue Jun 14, 2023 · 2 comments
Open

[TTS]文本中的连续点号会导致bug #3339

arhcer opened this issue Jun 14, 2023 · 2 comments

Comments

@arhcer
Copy link

arhcer commented Jun 14, 2023

我用的是r1.4分支。
当我输入文本 “一边...一边...”写出脱离险境的劳累 时,会报错,提示tone_sandhi.py第89行报错。
排查发现,问题在于zh_frontend.py第235行有问题,更深一步的原因在于seg_cut = psg.lcut(seg),而pinyins = self.g2pW_model(seg)[0],pinyins中的每个元素即可能是一个字的拼音,也可能是连续标点,例如'...',然后后续会报错。
解决办法是更改zh_frontend.py中的232行至235行,改成如下:

for i, (word, pos) in enumerate(seg_cut):
    sub_initials = []
    sub_finals = []
    if not '\u4e00' <= word[0] <= '\u9fff':
        if i > 0 and not '\u4e00' <= seg_cut[i-1][0][0] <= '\u9fff':
            continue
        else:
            now_word_length = pre_word_length + 1
    else:
        now_word_length = pre_word_length + len(word)
@arhcer
Copy link
Author

arhcer commented Jun 14, 2023

zh_frontend.py第225行char_split参数改为True也可以解决这个问题。不需要其余更改代码。

@arhcer
Copy link
Author

arhcer commented Jun 15, 2023

onnx_api.py第215行附近需要保证len(sent) == len(pypinyin_result)
需要添加的代码为:

tmp = []
for x in pypinyin_result:
    if x[0].isalnum():
        tmp.append(x)
    else:
        tmp.extend(list(x[0]))
    pypinyin_result = tmp
assert len(sent) == len(pypinyin_result)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant