You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #190 patches one narrow problem of the tokenizer, making it handle alternations correctly when tokenizing, but it's not general. For example, ^ should be stripped from rules, unless it's a letter in the language (see #190 (comment)).
A better solution would probably be to use a character inventory for the language that enumerates all characters that are not considered letters according to the Unicode standard, but that are actually used as letters in the language, instead of paring the input field for each g2p rule.
The text was updated successfully, but these errors were encountered:
PR #190 patches one narrow problem of the tokenizer, making it handle alternations correctly when tokenizing, but it's not general. For example,
^
should be stripped from rules, unless it's a letter in the language (see #190 (comment)).A better solution would probably be to use a character inventory for the language that enumerates all characters that are not considered letters according to the Unicode standard, but that are actually used as letters in the language, instead of paring the input field for each g2p rule.
The text was updated successfully, but these errors were encountered: