You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your enhancement related to a problem? Please describe.
Currently YLS compiles regular expressions without Unicode support.
This means that patterns using Unicode character groups don't work. For example:
$ node
Welcome to Node.js v12.21.0.
Type ".help" for more information.
> var pattern = '^[\\p{Letter}]+$';
undefined
> var rb = new RegExp(pattern);
undefined
> var ru = new RegExp(pattern, 'u');
undefined
> rb.test('test');
false
> ru.test('test');
true
Historically the standard left Unicode of regular expressions undefined (implementations were free to choose), but with draft 2020-12 of the standard that's been clarified and regexs should support Unicode.
Technically YLS targets draft-07 of JSON schema, and is thus conformant. However users reasonably expect to be able to throw more recent schema versions at it and have it largely work -- and it does! because the standard has remained broadly compatible.
However, the specific experience of using schemas with Unicode character groups is poor, because they fail to validate as expected.
Describe the solution you would like
YLS should build pattern and patternProperties keywords with Unicode support (using 'u', flag, as in new RegExp(pattern, 'u').
This is allowed by the targeted draft-07 standard.
Describe alternatives you have considered
Unfortunately there aren't any, aside from "don't use Unicode groups in patterns", which is pretty silly in 2021.
The downside risk is this breaks pre-existing patterns which explicitly expect to handle Unicode as UTF-8 codepoints in a binary string. However:
Today, such patterns are in turn broken by other JSON schema implementations which do support Unicode.
They're also broken by string encodings other than UTF-8.
They're going to get more broken as implementations shift to Unicode as per the updated standard.
Additional context
I have a PR forthcoming.
The text was updated successfully, but these errors were encountered:
Is your enhancement related to a problem? Please describe.
Currently YLS compiles regular expressions without Unicode support.
This means that patterns using Unicode character groups don't work. For example:
Historically the standard left Unicode of regular expressions undefined (implementations were free to choose), but with draft 2020-12 of the standard that's been clarified and regexs should support Unicode.
Technically YLS targets draft-07 of JSON schema, and is thus conformant. However users reasonably expect to be able to throw more recent schema versions at it and have it largely work -- and it does! because the standard has remained broadly compatible.
However, the specific experience of using schemas with Unicode character groups is poor, because they fail to validate as expected.
Describe the solution you would like
YLS should build
pattern
andpatternProperties
keywords with Unicode support (using 'u', flag, as innew RegExp(pattern, 'u')
.This is allowed by the targeted draft-07 standard.
Describe alternatives you have considered
Unfortunately there aren't any, aside from "don't use Unicode groups in patterns", which is pretty silly in 2021.
The downside risk is this breaks pre-existing patterns which explicitly expect to handle Unicode as UTF-8 codepoints in a binary string. However:
Additional context
I have a PR forthcoming.
The text was updated successfully, but these errors were encountered: