Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

context_after="\s|$" does not always work correctly for end of word #134

Open
joanise opened this issue Nov 1, 2021 · 1 comment
Open

Comments

@joanise
Copy link
Collaborator

joanise commented Nov 1, 2021

In some mappings, we use context_after = \s|$ to do some processing on the end of a word.

Examples:

French:

  • fra_to_ipa.csv had rules like `` to delete the silent word-final "s" (changed to \b on 2021-11-01)
  • g2p convert "tests, tests tests" outputs tʌsts, tʌst tʌst, showing that before a space, and string final, it works, but not before a comma.

Mi'kmaq:

  • mic_to_ipa.json uses $ to match word-final.
  • g2p convert "tt" mic mic-ipa outputs tət
  • g2p convert "tt, tt tt" mic mic-ipa outputs ətt, tt tət

Several other mappings use $ one way or another.

Not sure what the best solution is. \b is also not always right, (e.g., it's incompatible with prevent_feeding). It fixes French, in any case.

@joanise
Copy link
Collaborator Author

joanise commented Aug 21, 2023

@roedoejet not sure if #277 fixes this or not. I'm pretty sure it will make rule writing easier, but I expect there are still corner cases where "end of word" will remain difficult to define reliably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant