Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the handling of punctuation in Undetermined #151

Open
joanise opened this issue Apr 12, 2022 · 0 comments
Open

Improve the handling of punctuation in Undetermined #151

joanise opened this issue Apr 12, 2022 · 0 comments

Comments

@joanise
Copy link
Collaborator

joanise commented Apr 12, 2022

Right now, the Undetermined und mapping issues errors if it tries to process a word with a colon or other punctuation in it. The comma and apostrophe have a sound mapping, but not the colon, semi-colon or period. Punctuation should normally be tokenized out of the words to g2p, but sometimes they are not, e.g., when we're using an und fallback for text in a language that has that punctuation symbol as a latter.

When a punctuation symbol is passed to und for g2p mapping, we should do something reasonable. Colon could logically map to the length marker, and other punctuation marks could logically just be stripped out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant