Compiler should error when encountering invalid regular expressions #3432

DanielRosenwasser · 2015-06-09T01:20:50Z

In tests/cases/conformance/parser/ecmascript5/RegressionTests/parser579071.ts we have an invalid regex:

var x = /fo(o/;

This is not valid JavaScript, but we don't give any errors. It would be helpful to let users know when their regular expressions are invalid.

The text was updated successfully, but these errors were encountered:

pygy · 2022-05-04T07:12:05Z

I had missed this by only searching for RegExp...

I'm working on a Spec-based RegExp tokenizer written in JS, hopefully you'll be able to make use of it.

graphemecluster · 2023-06-23T15:17:04Z

Moved from #54744:

Previously worked on #51837, I found that TypeScript gives almost no syntax errors for regular expressions. I would like to file a PR about it.

Something I would like to do are:

Check for duplicated or unknown flags
Check for unbalanced parentheses, which is the most common mistake people make
Check for invalid escapes
But this should be done only for RegExps with u or v flag, i.e. in UnicodeMode, that means if we encounter a u or v flag we will need to rescan the whole RegExp again (!!) (i.e. redoing what is done in the current reScanSlashToken method)
And to check for invalid DecimalEscapes and k<GroupName>s we will also need to count the number of capture groups and record the names of all named capture groups along the way.

Am I doing too much or too less? I know doing too much may cause serious performace regressions (well, luckily regular expression literals are not that common compared with string literals). It should be better than doing nothing after all though.

RyanCavanaugh · 2023-06-23T16:22:25Z

Imbalanced parens / bracket seems like where 99% of the value is.

Duplicate flags seems like an error no one's ever made before; usually its /g or /m, /gmi, etc, I can't imagine writing /mgm unless a cat walked on my keyboard.

Erroring on escapes that are invalid regardless of flags seems like a fine compromise.

IMO it's really actually fine if once a year your program unconditionally crashes on startup in the cases where you made an extremely rare mistake. The value is in flagging errors that are made every day.

graphemecluster · 2023-06-25T16:20:34Z

Yup, duplicate flags checking is valueless 😅, but unknown seems worth a bit (like if someone accidentally typed the Cyrillic у instead of the Latin y for some reason with the "Editor > Unicode Highlight: Ambiguous Characters" config turned off or any of the "Editor > Unicode Highlight: Allowed Locales" permit the character). (Plus, there are no reasons not to show errors if the flag part isn’t really flags (like foo_bar).) So far checking for flag availability according to target language version seems to be the most worthwhile part for flag check.

I am still wondering if a full parse should be done. Doing that does affect performance, but it would be helpful to further TypeScript extensions like #41160. Of course, this requires sub-nodes to be added under RegularExpressionLiteral. The actual implementation is not that tough, as we can take engine262 as a reference. (If we go with this, where should the class be put? scanner.ts? parser.ts? Or a new file?)

graphemecluster · 2023-06-27T15:15:40Z

Pinging @RyanCavanaugh and @DanielRosenwasser for opinions.

RyanCavanaugh · 2023-06-27T15:35:57Z

I feel like we should be able to do a parse-only pass (i.e. just scan and descend in order to validate) of the regex without creating nodes as you go, since there's no consumer of that output, just the production of errors as a side effect

graphemecluster · 2023-06-28T20:04:39Z

The output are useful to TypeScript API users for creating RegExp-related type utilities. We could make use of the parsed result to make methods like String.prototype.replace safer too.
After all, I think we should at least store the number of capture groups and the names of the named captured groups.

RyanCavanaugh · 2023-06-28T20:46:27Z

The downstream tools can re-parse if they really need that data.

We're very sensitive to perf papercuts and not likely to accept the feature if the perf cost is nontrivial, and allocating more objects is something that is likely to incur broad perf hits due to slowing down GC, etc..

zm-cttae · 2023-07-04T10:28:23Z

Surely this is a case of attempting to reserialise the raw regexp to the string representation then into RegExp class? If it blows up, the string is invalid! That should be fast too.

zm-cttae · 2023-07-05T06:34:12Z

I see Ryan has suggested the same thing but in CS speak 😛

RyanCavanaugh · 2023-07-06T23:16:03Z

I actually hadn't thought of it in that much simpler way!

graphemecluster · 2023-07-07T14:58:11Z

@zm-cttae @RyanCavanaugh There are already attempts like #4387 and #35957 using this approach but was closed. And this is definitely not a good solution, because:

If there are any errors, the whole RegExp expression is underlined, it’s not useful especially when the expression is long. Plus, the platform-specified error messages are not always clear, and we can’t translate them into other locales.
There might be multiple errors in the RegExp, but only the first error is revealed.
TypeScript is not limited to being executed with Node.js. It may also be run, for example, in a web browser via monaco-editor. That means by using the built-in RegExp constructor the behavior is not guaranteed and may differ due to features implemented in the JS engine. Actually, I plan to include some features that are currently Stage 3 proposals into my implementation in advance.

Luckily a simple parser without node generation has little effect on the performance, and I plan to make my PR available at the end of this month.

Besides I have a follow-up proposal to further enforce type safety of RegExp-related methods and enhance UX (providing auto-completion) that make use of the implementation, but that would be another separate issue and PR afterwards.

zm-cttae · 2023-07-07T23:54:39Z

To be fair there are only two types of error that can be assessed independently of each other - bad escapes, and invalid grouping

zm-cttae · 2023-07-07T23:55:13Z

I agree however that the error isn't too useful for assessing what broke the regex in the first place.
Combine it with the platform-specificity problem and yes we will be needing a parser

graphemecluster · 2023-07-31T02:31:42Z

I didn’t manage to finish the implementation within this month due to my work but it’s coming along nicely. I’ll be back on it shortly and make the PR available ASAP.

DanielRosenwasser added Suggestion An idea for TypeScript Help Wanted You can do this labels Jun 9, 2015

DanielRosenwasser added this to the Community milestone Jun 9, 2015

mhegazy added Bug A bug in TypeScript and removed Suggestion An idea for TypeScript labels Jun 9, 2015

Schmavery mentioned this issue Aug 21, 2015

Add invalid regular expression literal error #3432 #4387

Closed

RyanCavanaugh modified the milestones: Community, Backlog Mar 7, 2019

JoshuaKGoldberg mentioned this issue Jan 2, 2020

Added validation for regex literals via RegExp constructor #35957

Closed

andrewbranch mentioned this issue May 3, 2022

JS/TS detect all syntax errors in RegExps. #48933

Closed

MartinJohns mentioned this issue Jun 22, 2023

Provide Basic Syntax Check for Regular Expression Literals #54744

Closed

graphemecluster mentioned this issue Sep 1, 2023

Provide Syntax Checking for Regular Expressions #55600

Merged

DanielRosenwasser closed this as completed in #55600 Apr 19, 2024

DanielRosenwasser modified the milestones: Backlog, TypeScript 5.5.0 Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiler should error when encountering invalid regular expressions #3432

Compiler should error when encountering invalid regular expressions #3432

DanielRosenwasser commented Jun 9, 2015

pygy commented May 4, 2022

graphemecluster commented Jun 23, 2023 •

edited

Loading

RyanCavanaugh commented Jun 23, 2023 •

edited

Loading

graphemecluster commented Jun 25, 2023

graphemecluster commented Jun 27, 2023

RyanCavanaugh commented Jun 27, 2023

graphemecluster commented Jun 28, 2023

RyanCavanaugh commented Jun 28, 2023

zm-cttae commented Jul 4, 2023 •

edited

Loading

zm-cttae commented Jul 5, 2023

RyanCavanaugh commented Jul 6, 2023

graphemecluster commented Jul 7, 2023

zm-cttae commented Jul 7, 2023

zm-cttae commented Jul 7, 2023 •

edited

Loading

graphemecluster commented Jul 31, 2023

Compiler should error when encountering invalid regular expressions #3432

Compiler should error when encountering invalid regular expressions #3432

Comments

DanielRosenwasser commented Jun 9, 2015

pygy commented May 4, 2022

graphemecluster commented Jun 23, 2023 • edited Loading

RyanCavanaugh commented Jun 23, 2023 • edited Loading

graphemecluster commented Jun 25, 2023

graphemecluster commented Jun 27, 2023

RyanCavanaugh commented Jun 27, 2023

graphemecluster commented Jun 28, 2023

RyanCavanaugh commented Jun 28, 2023

zm-cttae commented Jul 4, 2023 • edited Loading

zm-cttae commented Jul 5, 2023

RyanCavanaugh commented Jul 6, 2023

graphemecluster commented Jul 7, 2023

zm-cttae commented Jul 7, 2023

zm-cttae commented Jul 7, 2023 • edited Loading

graphemecluster commented Jul 31, 2023

graphemecluster commented Jun 23, 2023 •

edited

Loading

RyanCavanaugh commented Jun 23, 2023 •

edited

Loading

zm-cttae commented Jul 4, 2023 •

edited

Loading

zm-cttae commented Jul 7, 2023 •

edited

Loading