Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextMate grammar regexes able to only match a single line #32

Closed
mjbvz opened this issue Jan 3, 2017 · 4 comments
Closed

TextMate grammar regexes able to only match a single line #32

mjbvz opened this issue Jan 3, 2017 · 4 comments
Assignees
Labels

Comments

@mjbvz
Copy link
Contributor

mjbvz commented Jan 3, 2017

From @tambry on December 31, 2016 16:20

Image of improper multiline syntax highlighting

Everything between the parentheses should be coloured green, but the matching stops after the first line ends.

  • VSCode Version: 1.8.1
  • OS Version: Ubuntu 16.10 64-bit (4.8.0-30-generic)

Steps to Reproduce:

  1. Clone this minimal repro sample.
  2. Open the folder of the sample in VS Code.
  3. Press F5 to run the extension in a new VS Code instance.
  4. In the new instance open the test.matchbug file that's in the repro sample's root directory.
  5. Code is improperly highlighted.

How the regex should match

Copied from original issue: microsoft/vscode#17964

@ghost
Copy link

ghost commented Jan 3, 2017

As far as I am aware, this has always been the intended behavior for TextMate grammars. In other words, Atom, Sublime Text, and TextMate all only allow regexes to match a single line in their TextMate grammar parser implementations.

If the vscode implementation changed this, it would probably lead to grammars being developed that are incompatible with the other editors.

@tambry
Copy link

tambry commented Jan 3, 2017

Uh, that's unfortunate. So I guess making really good TextMate grammars is impossible for certain languages. I encountered this issue when trying to develop a grammar for CMake, where commands are often long and are split onto many lines.

Would it be maybe possible to consider supporting a special value or property to allow multiline regexes? Ie. multiline: true would enable multiline regex matching in the current and sub-scopes, unless explicitly disabled again.
This would prove very useful in languages that may have commands on multiple lines and for making better highlighting through grammar files. Ie. some languages have special constants that are only accepted by a single command, so you wouldn't want to highlight them anywhere else, except when they're being passed to that command.

@ghost
Copy link

ghost commented Jan 3, 2017

Uh, that's unfortunate. So I guess making really good TextMate grammars is impossible for certain languages. I encountered this issue when trying to develop a grammar for CMake, where commands are often long and are split onto many lines.

It's an unfortunate limitation but the rationale is that it's one way (albeit a crude way) to prevent grammars from being too inefficient. There are much better ways to handle this of course (e.g., supporting proper incremental parsing in the first place) but I guess it's the price to pay for using TextMate grammars.

Would it be maybe possible to consider supporting a special value or property to allow multiline regexes? Ie. multiline: true would enable multiline regex matching in the current and sub-scopes, unless explicitly disabled again.
This would prove very useful in languages that may have commands on multiple lines and for making better highlighting through grammar files. Ie. some languages have special constants that are only accepted by a single command, so you wouldn't want to highlight them anywhere else, except when they're being passed to that command.

Personally I don't think adding an extension like this is a good idea. If the whole rationale for using TextMate grammars is compatibility, it doesn't make much sense to encourage people to write grammars which end up not working as expected in other editors. Plus, you give up compatibility but don't address any of the other rather serious limitations of the TextMate grammar format.

It would be better to switch to a superior format without these limitations. Monaco (the vscode editor component) already supports a different format that generally seems more flexible (Monarch), so maybe it would be worthwhile trying to get that supported directly in vscode again.

Having said all of that, it is actually possible to do accurate multiline highlighting with TextMate grammars, it's just tedious. You have to make extensive use of the oniguruma lookaround features to impose an ordering on the subrules so you can chain them properly. But if you generate the grammars, doing this is not so bad.

Below is an example of accurate highlighting using that approach. You can see how it works here.

highlights

@tambry
Copy link

tambry commented Jan 4, 2017

Monarch definitely seems more flexible and usable than TextMate for such grammars. Though an actual API for parsing and highlighting would be even better.
There does seem to be a feature requests for this in microsoft/vscode#216. I guess I'll have to wait and live until someone actually implements a better way to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants