Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance problem with /k/i and /s/i #97

Closed
k-takata opened this issue Nov 21, 2017 · 0 comments · Fixed by #113
Closed

Performance problem with /k/i and /s/i #97

k-takata opened this issue Nov 21, 2017 · 0 comments · Fixed by #113

Comments

@k-takata
Copy link
Owner

Originally reported at kkos/oniguruma#71.

If a pattern is case-insensitive and it contains the letter "k" or "s", the match slows down when encoding is UTF-8.
Onigmo uses different optimization methods for fixed strings. It uses Sunday's quick search with support for case-insensitive search instead of Boyer-Moore search (case-sensitive). However, there is a problem with the case-insensitive search. /s/i matches ſ (U+017F, LATIN SMALL LETTER LONG S) and /k/i matches (U+212A, KELVIN SIGN) also. These characters are 2 or 3 bytes in UTF-8, so the lengths are differ from the original characters. Therefore optimization is turned off.

The actual problem is that if the pattern is /----k/i, the first 4 characters (----) should be used for optimization, however currently Onigmo totally turn of the optimization.

I'm preparing a fix for this.

k-takata added a commit that referenced this issue Jan 24, 2019
E.g.
For the pattern `/----k/i`, optimization was totally turned off.
Make it possible to use the characters before `k` (i.e. `----`) for
optimization.
k-takata added a commit that referenced this issue Jan 24, 2019
Fix performance problem with /k/i and /s/i (Close #97)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant