Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[\p{Emoji}] also matches numerical characters #147

Open
hachi8833 opened this issue Mar 6, 2020 · 1 comment
Open

[\p{Emoji}] also matches numerical characters #147

hachi8833 opened this issue Mar 6, 2020 · 1 comment

Comments

@hachi8833
Copy link

Looks like this behavior is a bug:
https://rubular.com/r/XH1SVLPKgG7eM7

As far as I see, [\p{Emoji}] matches only 0123456789, and does not match any other N scripts such as ¼〇ⅲ៥

Could you please check this?

@747
Copy link

747 commented Jun 13, 2021

Just dropping by, but this seems a spec-conforming behavior, as described in this document:
https://www.unicode.org/Public/UCD/latest/ucd/emoji/emoji-data.txt

Not all emoji were added as new characters; some are just refashioning of (sequences of) existing characters.

"0".codepoints.map{ |c| sprintf("%04X", c) }
# => ["0030"]

"0️⃣".codepoints.map{ |c| sprintf("%04X", c) }
# => ["0030", "FE0F", "20E3"] (<- "0" + VS16 (emoji) + keycap)

\p{Emoji} represents a character that could be part of emoji. For matching a character that is unambiguously a part of emoji, \p{Emoji_Presentation} is a better choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants