Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a protocol for syntax tokens #1063

Open
floitsch opened this issue Aug 7, 2020 · 7 comments
Open

Define a protocol for syntax tokens #1063

floitsch opened this issue Aug 7, 2020 · 7 comments
Labels
help wanted Issues identified as good community contribution opportunities idea
Milestone

Comments

@floitsch
Copy link
Contributor

floitsch commented Aug 7, 2020

Is my reading correct, that semantic tokens only add additional information to tokens?
Can we trust that the clients will render keywords correctly, if they already have a syntax highlighter (frequently regexp based).

If that's the case, should there be an option to "unset" a type. Just as an example, if you have #ifdefs you could think of giving the LSP server the option of removing syntax coloring from inactive code segments by giving it the token-type None.

@rcjsuen
Copy link
Contributor

rcjsuen commented Aug 7, 2020

Is my reading correct, that semantic tokens only add additional information to tokens?

That is what Visual Studio Code does. I am not sure about other LSP clients.

Can we trust that the clients will render keywords correctly, if they already have a syntax highlighter (frequently regexp based).

If that's the case, should there be an option to "unset" a type. Just as an example, if you have #ifdefs you could think of giving the LSP server the option of removing syntax coloring from inactive code segments by giving it the token-type None.

About merging the tokens (via applying them on top) or replacing the tokens (by discarding the client information from the grammar), that has been discussed in the past (#18 (comment)) but it seems to have stalled a bit.

@dbaeumer
Copy link
Member

All I can currently think of is to make this a client capability. I guess we will not be able to force a consistent behavior over all clients.

@dbaeumer dbaeumer added this to the Backlog milestone Nov 13, 2020
@radeksimko
Copy link
Contributor

Was it ever an ambition of semantic tokens to replace the existing (static, often regex-based) grammars? e.g. do you foresee any VS Code extension using exclusively semantic tokens and have no TM grammar?

Or is there a reason why using semantic tokens as the exclusive way of highlighting files would be a bad idea?


Personally I think it would be great if the wider community could agree on a single protocol/format for syntax highlighting and adding this to LSIF would probably help too. There is just way too many different solutions to the same problem (TextMate is the most popular one, but really it's one of many).

I can see that being a long journey though as I reckon the editors would still want to provide some highlighting without LSP, assuming language servers today usually aren't built in the editors, so having to take any extra step just to get syntax highlighting to work seems like a potential source of friction in that context.

@dbaeumer
Copy link
Member

dbaeumer commented Nov 1, 2021

Added a capability augmentsSyntaxTokens to the spec to allow clients to express the behavior. I will keep the issue for the syntax support.

@dbaeumer dbaeumer changed the title Clarify interaction of LSP semantic tokens with syntax highlighters Define a protocol for syntax tokens Nov 1, 2021
@dbaeumer dbaeumer added idea help wanted Issues identified as good community contribution opportunities and removed clarification semantic tokens labels Nov 1, 2021
@sclu1034
Copy link

sclu1034 commented Apr 25, 2022

I'd love to see a clarification on the intended/expected scope of semanticTokens as well.

Even within just the few language servers I use regularly, there is huge variety in what highlighting tokens they provide.
Some simply stick to the predefined values from the spec, some double down on the "additional color information" and provide only tokens that the client can't/doesn't know, while others go all out to provide a token for almost every character in a file.

I agree with radeksimko above that it would be great if LSP could serve as a unified provider for syntax+semantic highlighting.
This would also lessen the amount of work required for the user to configure/theme highlighting, since you don't have to combine two independent systems to look nice together.

@radeksimko
Copy link
Contributor

radeksimko commented Apr 25, 2022

Some simply stick to the predefined values from the spec, some double down on the "additional color information" and provide only tokens that the client can't/doesn't know, while others go all out to provide a token for almost every character in a file.

FWIW We have recently implemented custom token types and modifiers while still keeping the predefined values from spec as fallback, so with the right use of capability negotiation this doesn't need to be a "mutually exclusive" choice.

This of course assumes that if you want to make use of these custom token types and modifiers, the client needs to implement them - and practically couple it a bit more with a particular server, but that seems okay to me - as the highlighting should work just as before for those clients which do not support these custom types and modifiers, as long as they both do the capability negotiation right.


However the whole highlighting chain in practice looks more like this language server <--> language client <--> theme. i.e. highlighting capabilities (including augmentsSyntaxTokens) are IMO largely dependent on themes, which means that if client is supposed to provide accurate capabilities it would have to somehow consult the theme.

There is also ambiguity in handling of conflicts between extensions/themes which claim the same files on the client.

VS Code extension API allows you to define mapping of your custom types to TM scopes via semanticTokenScopes. That however only solves the (admittedly more common) problem of conflicts between generic themes working with TM scopes and token-based themes. It does not address conflicts between themes which are both token-based (where one could support predefined tokens like property and the other just customToken).

Perhaps none of the above are strictly LSP problems but they are problems client maintainers will likely run into when implementing semantic token based highlighting.

Maybe LSP could help solve these problems if it somehow reflected the reality where token types have fallbacks and they're not entirely independent of each other? I suppose ordering within the capability arrays can already do that but the spec isn't clear on whether ordering is important (beyond serving as a legend).

@rhdunn
Copy link

rhdunn commented Dec 28, 2022

There are some languages (e.g. XQuery) where a state-based tokenizer is necessary in order to determine the correct tokens. This is partially because it functions as a templating language such as PHP or the Liquid templating engine where you can mix XML and XPath/XQuery code.

Additionally, XQuery can use most of the keywords as identifiers, with different versions of the language (and vendor extensions) having different sets of reserved keywords. This means that any regex-based keyword syntax highlighting will need to be able to remove keyword highlighting and specify a semantics-based highlighting (variable, function, etc.).

Having to maintain two different grammars or hand-written lexers/parsers for the syntax highlighting and everything else generally defeats the point of having the LSP separate from the editor. It also means that there can be differences in the highlighting expecially for complex languages, or language features like string interpolation and language injection (e.g. CSS in HTML style elements).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issues identified as good community contribution opportunities idea
Projects
None yet
Development

No branches or pull requests

6 participants