context_after="\s|$" does not always work correctly for end of word #134

joanise · 2021-11-01T17:56:58Z

In some mappings, we use context_after = \s|$ to do some processing on the end of a word.

Examples:

French:

fra_to_ipa.csv had rules like `` to delete the silent word-final "s" (changed to \b on 2021-11-01)
g2p convert "tests, tests tests" outputs tʌsts, tʌst tʌst, showing that before a space, and string final, it works, but not before a comma.

Mi'kmaq:

mic_to_ipa.json uses $ to match word-final.
g2p convert "tt" mic mic-ipa outputs tət
g2p convert "tt, tt tt" mic mic-ipa outputs ətt, tt tət

Several other mappings use $ one way or another.

Not sure what the best solution is. \b is also not always right, (e.g., it's incompatible with prevent_feeding). It fixes French, in any case.

The text was updated successfully, but these errors were encountered:

joanise · 2023-08-21T14:47:40Z

@roedoejet not sure if #277 fixes this or not. I'm pretty sure it will make rule writing easier, but I expect there are still corner cases where "end of word" will remain difficult to define reliably.

roedoejet mentioned this issue Aug 15, 2023

A variety of bug fixes from working with AM #277

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

context_after="\s|$" does not always work correctly for end of word #134

context_after="\s|$" does not always work correctly for end of word #134

joanise commented Nov 1, 2021

joanise commented Aug 21, 2023

context_after="\s|$" does not always work correctly for end of word #134

context_after="\s|$" does not always work correctly for end of word #134

Comments

joanise commented Nov 1, 2021

joanise commented Aug 21, 2023