Define a protocol for syntax tokens #1063

floitsch · 2020-08-07T10:05:38Z

Is my reading correct, that semantic tokens only add additional information to tokens?
Can we trust that the clients will render keywords correctly, if they already have a syntax highlighter (frequently regexp based).

If that's the case, should there be an option to "unset" a type. Just as an example, if you have #ifdefs you could think of giving the LSP server the option of removing syntax coloring from inactive code segments by giving it the token-type None.

The text was updated successfully, but these errors were encountered:

rcjsuen · 2020-08-07T10:17:10Z

Is my reading correct, that semantic tokens only add additional information to tokens?

That is what Visual Studio Code does. I am not sure about other LSP clients.

Can we trust that the clients will render keywords correctly, if they already have a syntax highlighter (frequently regexp based).

If that's the case, should there be an option to "unset" a type. Just as an example, if you have #ifdefs you could think of giving the LSP server the option of removing syntax coloring from inactive code segments by giving it the token-type None.

About merging the tokens (via applying them on top) or replacing the tokens (by discarding the client information from the grammar), that has been discussed in the past (#18 (comment)) but it seems to have stalled a bit.

dbaeumer · 2020-11-11T14:44:23Z

All I can currently think of is to make this a client capability. I guess we will not be able to force a consistent behavior over all clients.

radeksimko · 2020-11-23T15:14:59Z

Was it ever an ambition of semantic tokens to replace the existing (static, often regex-based) grammars? e.g. do you foresee any VS Code extension using exclusively semantic tokens and have no TM grammar?

Or is there a reason why using semantic tokens as the exclusive way of highlighting files would be a bad idea?

Personally I think it would be great if the wider community could agree on a single protocol/format for syntax highlighting and adding this to LSIF would probably help too. There is just way too many different solutions to the same problem (TextMate is the most popular one, but really it's one of many).

I can see that being a long journey though as I reckon the editors would still want to provide some highlighting without LSP, assuming language servers today usually aren't built in the editors, so having to take any extra step just to get syntax highlighting to work seems like a potential source of friction in that context.

dbaeumer · 2021-11-01T16:34:07Z

Added a capability augmentsSyntaxTokens to the spec to allow clients to express the behavior. I will keep the issue for the syntax support.

sclu1034 · 2022-04-25T11:11:14Z

I'd love to see a clarification on the intended/expected scope of semanticTokens as well.

Even within just the few language servers I use regularly, there is huge variety in what highlighting tokens they provide.
Some simply stick to the predefined values from the spec, some double down on the "additional color information" and provide only tokens that the client can't/doesn't know, while others go all out to provide a token for almost every character in a file.

I agree with radeksimko above that it would be great if LSP could serve as a unified provider for syntax+semantic highlighting.
This would also lessen the amount of work required for the user to configure/theme highlighting, since you don't have to combine two independent systems to look nice together.

radeksimko · 2022-04-25T13:24:57Z

Some simply stick to the predefined values from the spec, some double down on the "additional color information" and provide only tokens that the client can't/doesn't know, while others go all out to provide a token for almost every character in a file.

FWIW We have recently implemented custom token types and modifiers while still keeping the predefined values from spec as fallback, so with the right use of capability negotiation this doesn't need to be a "mutually exclusive" choice.

This of course assumes that if you want to make use of these custom token types and modifiers, the client needs to implement them - and practically couple it a bit more with a particular server, but that seems okay to me - as the highlighting should work just as before for those clients which do not support these custom types and modifiers, as long as they both do the capability negotiation right.

However the whole highlighting chain in practice looks more like this language server <--> language client <--> theme. i.e. highlighting capabilities (including augmentsSyntaxTokens) are IMO largely dependent on themes, which means that if client is supposed to provide accurate capabilities it would have to somehow consult the theme.

There is also ambiguity in handling of conflicts between extensions/themes which claim the same files on the client.

VS Code extension API allows you to define mapping of your custom types to TM scopes via semanticTokenScopes. That however only solves the (admittedly more common) problem of conflicts between generic themes working with TM scopes and token-based themes. It does not address conflicts between themes which are both token-based (where one could support predefined tokens like property and the other just customToken).

Perhaps none of the above are strictly LSP problems but they are problems client maintainers will likely run into when implementing semantic token based highlighting.

~~Maybe LSP could help solve these problems if it somehow reflected the reality where token types have fallbacks and they're not entirely independent of each other?~~ I suppose ordering within the capability arrays can already do that but the spec isn't clear on whether ordering is important (beyond serving as a legend).

rhdunn · 2022-12-28T13:01:08Z

There are some languages (e.g. XQuery) where a state-based tokenizer is necessary in order to determine the correct tokens. This is partially because it functions as a templating language such as PHP or the Liquid templating engine where you can mix XML and XPath/XQuery code.

Additionally, XQuery can use most of the keywords as identifiers, with different versions of the language (and vendor extensions) having different sets of reserved keywords. This means that any regex-based keyword syntax highlighting will need to be able to remove keyword highlighting and specify a semantics-based highlighting (variable, function, etc.).

Having to maintain two different grammars or hand-written lexers/parsers for the syntax highlighting and everything else generally defeats the point of having the LSP separate from the editor. It also means that there can be differences in the highlighting expecially for complex languages, or language features like string interpolation and language injection (e.g. CSS in HTML style elements).

dbaeumer added semantic tokens clarification labels Nov 11, 2020

dbaeumer added this to the Backlog milestone Nov 13, 2020

dbaeumer changed the title ~~Clarify interaction of LSP semantic tokens with syntax highlighters~~ Define a protocol for syntax tokens Nov 1, 2021

dbaeumer added idea help wanted Issues identified as good community contribution opportunities and removed clarification semantic tokens labels Nov 1, 2021

gundermanc mentioned this issue Mar 11, 2022

SemanticTokens Edits not performant in large documents #1421

Open

sclu1034 mentioned this issue Apr 25, 2022

[semantic highlight] Add basic language tokens clangd/clangd#1115

Open

frankplow mentioned this issue Aug 24, 2022

Feature Request: Produce fewer semantic tokens if the client uses augmentsSyntaxTokens rust-lang/rust-analyzer#12783

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define a protocol for syntax tokens #1063

Define a protocol for syntax tokens #1063

floitsch commented Aug 7, 2020

rcjsuen commented Aug 7, 2020

dbaeumer commented Nov 11, 2020

radeksimko commented Nov 23, 2020

dbaeumer commented Nov 1, 2021

sclu1034 commented Apr 25, 2022 •

edited

Loading

radeksimko commented Apr 25, 2022 •

edited

Loading

rhdunn commented Dec 28, 2022

Define a protocol for syntax tokens #1063

Define a protocol for syntax tokens #1063

Comments

floitsch commented Aug 7, 2020

rcjsuen commented Aug 7, 2020

dbaeumer commented Nov 11, 2020

radeksimko commented Nov 23, 2020

dbaeumer commented Nov 1, 2021

sclu1034 commented Apr 25, 2022 • edited Loading

radeksimko commented Apr 25, 2022 • edited Loading

rhdunn commented Dec 28, 2022

sclu1034 commented Apr 25, 2022 •

edited

Loading

radeksimko commented Apr 25, 2022 •

edited

Loading