Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I get scope / scopeRange at a position? #580

Open
seanmcbreen opened this issue Nov 24, 2015 · 44 comments
Open

Can I get scope / scopeRange at a position? #580

seanmcbreen opened this issue Nov 24, 2015 · 44 comments
Labels
api feature-request Request for new features or functionality tokenization Text tokenization
Milestone

Comments

@seanmcbreen
Copy link

From @billti on November 1, 2015 6:10

The API call document.getWordRangeAtPosition(position) appears to use its own definition of a word. For example, my tmLanguage defines attrib-name as a token/scope, yet getWordRangeAtPosition appears to break this into 2 words on the - character.

How can I get token ranges at a position based on my custom syntax? (And it would be really useful if I could get the scope name that goes along with it too).

Copied from original issue: Microsoft/vscode-extensionbuilders#76

@seanmcbreen
Copy link
Author

From @vilic on November 1, 2015 15:19

👍

@seanmcbreen
Copy link
Author

From @egamma on November 2, 2015 8:16

Exposing the scope names in the API is on the backlog, but will not make it into the November update.

@seanmcbreen
Copy link
Author

From @jrieken on November 2, 2015 18:16

@billti despite the lack of access to scopes you can define your a custom word definition such that it will be picked up by document.getWordRangeAtPosition. You can register a ITokenTypeClassificationSupport which can contribute a regex to classify words.

@seanmcbreen
Copy link
Author

From @billti on November 2, 2015 19:3

Thanks @jrieken , I spotted that, and it may be a useful interim solution. But generally for now, if I want to know the classification accurately for a position in a CFG, seems I'll need to document.getText() and run my own parser over it - is that right?

@seanmcbreen
Copy link
Author

From @jrieken on November 3, 2015 9:59

unfortunately yes

@alexdima alexdima added feature-request Request for new features or functionality api labels Nov 25, 2015
@egamma egamma modified the milestone: Backlog Dec 10, 2015
@alexdima alexdima removed their assignment Mar 17, 2016
@hoovercj
Copy link
Member

@egamma on November 2, 2015 8:16
Exposing the scope names in the API is on the backlog, but will not make it into the November update.

Is there any update on if/when we can expect a way to get the scopes at a position or offset?

@egamma
Copy link
Member

egamma commented May 16, 2016

@hoovercj all I can currently say is that this is still on the backlog, sorry.

@TimonVS
Copy link

TimonVS commented Oct 13, 2016

@egamma Any progress on this? Is there any way I can contribute? :)

@siegebell
Copy link

Would it be trivial to provide a command that returns a url to the TextMate grammar file being used for a particular document/languageId (or return the contents of the file to keep them read-only)? Then we could use vscode-textmate ourselves to get the token info at a particular location.

@hoovercj
Copy link
Member

@siegebell -- As a short-term solution, I have successfully included a textmate grammar with my extension , referenced that, and referenced the built-in vscode-textmate package to get token scopes in an extension.

It's a pain and it really should be part of the API, but it's definitely possible to do today.

I was given the advice to use: var tm = require(path.join(require.main.filename, '../../node_modules/vscode-textmate/release/main.js')); to access vscode-textmate, but since I have a language server I had to evaluate require.main.filename in the language client and pass it over with the initializationOptions to get the right value in my server.

@egamma
Copy link
Member

egamma commented Oct 14, 2016

@TimonVS exposing the scopes in API requires that we re-visit the internal representation of scopes, this requires major re-architecting and this makes challenging to open-up for contributions.

@siegebell
Copy link

In the meantime, I've published an extension, scope-info, that provides an API to query the scope at a particular position. It works by querying the installed extensions for language definitions and grammars, and then maintains a parse-state for each open document using vscode-textmate. Only one instance will exist per vscode instance, regardless of how many other extensions depend on it.

Example usage:

import * as api from 'scope-info'
async function example(doc : vscode.TextDocument, pos: vscode.Position) {
    const siExt = vscode.extensions.getExtension<api.ScopeInfoAPI>('siegebell.scope-info');
    const si = await siExt.activate();
    const t1 : api.Token = si.getScopeAt(doc, pos);
}

Notes:

  • For typings, refer to scope-info.d.ts.
  • You can also query the vscode-textmate-IGrammar and scope name of a language.
  • Your extension should list 'siegebell.scope-info' as an extensionDependency.
  • If multiple extensions contribute to the same language, scope-info may pick the wrong one.
  • Scope-info might return a scope corresponding to a slightly newer or older document version than what your extension thinks is current.
  • Pull requests are welcome.

@ramya-rao-a
Copy link
Contributor

exposing the scopes in API requires that we re-visit the internal representation of scopes, this requires major re-architecting and this makes challenging to open-up for contributions

@alexandrudima I believe the above was done as part of #18317

@aeschli Will #18068 be covering the feature ask in this current issue or are we suggesting extension authors to use https://marketplace.visualstudio.com/items?itemName=siegebell.scope-info?

@aeschli
Copy link
Contributor

aeschli commented Jan 19, 2017

Alex added a developer tool that lets you see the tokens at a location. See #17933 (comment)

There's still no extension API that returns text-mate scopes. Several reasons for that one of them that we don't want that extensions start depending on a particular tokenizer grammar.

@ImUrX
Copy link

ImUrX commented May 6, 2021

this issue would be so useful and is one of the oldest thats still open

@ghost
Copy link

ghost commented Oct 21, 2021

I recently wrote vscode-textmate-languageservice precisely to exploit Textmate tokens in providers such as folding, outline/TOC etcetera. Unfortunately the performance leaves much to be lacking because the code is tokenized again - Gimly/vscode-matlab#142

@universemaster
Copy link

I appreciate this issue is about a vscode API.

However, are you familiar with https://github.com/draivin/hscopes ?

A meta-extension for vscode that provides TextMate scope information. Its intended usage is as a library for other extensions to query scope information.

and

This extension provides an API by which your extension can query scope & token information.

@ghost
Copy link

ghost commented Nov 11, 2021

I need a dump of all the tokens in a document tbh. The information is there, it just needs to be exposed in a sane manner.

@ghost
Copy link

ghost commented Dec 10, 2021

For what it's worth I have hooked into the native module using Microsoft's getCoreNodeModule trick.
It works! But is also slow and retokenizes the entire document - https://github.com/SNDST00M/vscode-textmate-languageservice/blob/v0.2.2/src/util/getCoreNodeModule.ts

@savetheclocktower
Copy link

savetheclocktower commented Feb 1, 2022

There's still no extension API that returns text-mate scopes. Several reasons for that one of them that we don't want that extensions start depending on a particular tokenizer grammar.

I feel silly responding to a four-year-old comment, but it is the last “official” word on this issue, so here I go.

VSCode is the third major code editor to borrow TextMate's grammar system, and I wonder if they all thought that its scope names were simply an implementation detail of its grammars. Quite the opposite — a lot of thought went into this system, since it was also used as the basis for much of TextMate's customizability.

Scope names aren't just hooks for syntax highlighting. TextMate commands are tightly woven to the semantics of scope names. You can have the same key combination perform different commands based on scope, so that your command doesn't monopolize a hotkey for something that (say) only works when the cursor is within a string. Conversely, you can define a command that behaves identically across different languages because it hooks into the presence of a generic scope name. This is how TextMate recognizes URLs across multiple contexts — inside HTML files, inside Markdown files, inside code comments regardless of language — and implements a single “open this URL in my browser” command that works identically across all of them.

Even after moving from TextMate to Atom several years ago, I was able to keep almost all of my ornery customizations because Atom allowed me to inspect scopes at the cursor. I define a command in my Atom init-file whose only purpose is to interpret Enter on my num pad and delegate it to one of three other commands based on what scope I'm in. If I migrate to VSCode in the future, it'll be a reluctant migration if this issue is still open.

TextMate's scope naming conventions are middleware. VSCode could move to tree-sitter grammars tomorrow without breaking anyone's syntax coloring themes; it'd just need to map tree-sitter token names to existing scope names. If a “get all scopes at position X” API existed in VSCode, and I relied upon it when writing an extension, that extension would keep working in a future version of VSCode that no longer supported TM-style grammars but kept TM's scope naming conventions.

You may think, “Yes, but we don't want to make these naming conventions permanent! That's the whole point!” To which I'd ask: what would you replace them with, and why? Is there something that the existing naming conventions can't do? Is there a compelling reason to invent something new that would justify the amount of community effort it would take to adopt a new scheme? Would it make a migration toward tree-sitter grammars easier or harder if syntax coloring themes had to support two different naming systems at once?

As the comments on this issue illustrate, not having this functionality doesn't remove the need for extensions to know this information; it just means those extensions have to use imperfect workarounds. And it results in tighter coupling to TextMate grammars than would otherwise be necessary, since those workarounds need to know the grammar's implementation details to reproduce the result.

I hope you'll consider this feature request sometime soon; it'd be a huge customizability win.

@tjx666
Copy link
Contributor

tjx666 commented May 16, 2022

If I understand correctly, the extension author can using this to implements some function like hover tip without ast parsing. Ast parsing is really expensive sometimes.

@sandipchitale
Copy link

I have implemented an extension:

Show scopes at cursor in active editor

showing how to use API exposed by:

HyperScopes

Hope this helps someone.

@m-paternostro
Copy link

m-paternostro commented Jul 3, 2022

Another informal vote for this feature, if I may...

In our case, we want to correlate the content from the editor to an extremely rich repository of runtime-produced information. Without the minimum understanding of the code (precisely what symbols/scope/tokens would provide), such a correlation is faulty way more often than acceptable.

@Zxynine
Copy link

Zxynine commented Jul 15, 2022

I would love this feature too, I keep ending up here trying to find a way to know for sure if a given position is in a comment or not, it seems like without regex and reading of a language config file there is no clear way to know that. Even if I wasnt just looking for a way to know if something is within a comment, this feature is still something I want to see implemented and would make a world of difference to many extensions.

@pelmers
Copy link
Contributor

pelmers commented Aug 13, 2022

Like @Zxynine, I am also looking for a way to tell whether a selection in the editor is a comment. Apparently we are not the only ones. I found this commit in Better Comments which defines the line comment format for many languages: aaron-bond/better-comments@47717e7

So that's one way to implement this (at least for line comments), though not my favorite.

I have also tried the API exposed by Hyperscopes (https://github.com/draivin/hscopes), but I experienced multi-second freezes of the extension host even when editing very modestly sized files.

Perhaps tree-sitter would be fast enough to parse files without delay. I see the extensions https://github.com/georgewfraser/vscode-tree-sitter and https://github.com/EvgeniyPeshkov/syntax-highlighter import web-tree-sitter (wasm-compiled tree sitter modules) to provide syntax highlighting.

Of course these are all workarounds, and I think VS Code should provide access to this information. It knows it already, after all!

I agree with @savetheclocktower that the only given justification doesn't seem adequate. Can we quantify the risk? How big of a change has happened to Textmate grammars in the last 7 years?

@lukstafi
Copy link

lukstafi commented Nov 13, 2022

If vscode.provideDocumentRangeSemanticTokens was outputting all tokens for most languages, it would satisfy many of the needs discussed here. But from my limited experiments, it only outputs "interesting" tokens, it doesn't output tokens for comments, string literals, operators.

@Fred-Vatin
Copy link

Seven years this issue is open.

SEVEN YEARS !!!

I don’t expect this will be fixed soon. We’ll have to learn to live with it or move to another IDE.

@PoetaKodu
Copy link

Great. I cannot access information about my documents that are there, hidden behind VS Code wall. How is this a thing in 2022? Admins, do not ignore that please.

@zm-cttae-archive
Copy link

zm-cttae-archive commented Jan 28, 2023

Just to be clear - if you want one line scope and scopeRange, there is HyperScopes.
If you need full document use this.
The code is significantly more complex because we need browser support, promise caching and cross-env resource hashing for that.

Also, I chose not to use the browser streaming compiler for onig.wasm, I used webpack instead.
It's a very different approach from the monaco-tm repo.

If you use fetch you still need ${vscode.env.appRoot}/blablabla

@zm-cttae-archive
Copy link

zm-cttae-archive commented Jan 29, 2023

I need to update my code examples to work on web because only fetch works for wasm on web

@iCSawyer
Copy link

EIGHT years! Eight years after eight years, do you know how I've spent the last eight years? Why is it so difficult to provide them?

@zm-cttae-archive
Copy link

zm-cttae-archive commented Feb 24, 2023

I have solved this issue for myself and any language extension authors that can pass vscode.ExtensionContext.

EDIT: Some folk at the extension development slack want TypeScript support - I just released it this week.

There is a quite fast full-document tokenization API in vscode-textmate-languageservice - I haven't put JSDoc on it, but its stable and I'll never want/need to change API shape.

You need to set up your contributes to wire upa language and its grammar:

Then get our tokens:

import TextmateLanguageService from 'vscode-textmate-languageservice';

export async function activate(context: vscode.ExtensionContext) {
    const selector: vscode.DocumentSelector = 'matlab';
    const lsp = new TextmateLanguageService('matlab', context);
    const tokenService = await lsp.initTokenService();
    const activeTextDocument = vscode.window.activeTextEditor!.document;
    const tokens = tokenService.fetch(activeTextDocument);
};

It works in the browser and can do hugefiles quite quickly too. File hashing + caching is built in also.

There is a compulsory configuration which serves to enhance the results and generate folding level data.
If you are lazy make it {} at ./textmate-configuration.json it'll still work.

You can write your own scope and scopeRange functions by using startIndex endIndex and line properties.
The line property is zero-indexed FWIW (the way real API line numbers should be 😉)

Enjoy!

@zm-cttae-archive
Copy link

zm-cttae-archive commented Mar 29, 2023

Tokenization of Typescript and any grammar (without having to set up configuration) now available!

I used textmate-languageservice-contributes key to replace contributes so we don't override existing language contribution.

vsce-toolroom/vscode-textmate-languageservice@v1.2.1/README.md #tokenization

@zm-cttae
Copy link

zm-cttae commented Sep 8, 2023

https://github.com/vsce-toolroom/vscode-textmate-languageservice/releases/tag/v2.0.0

  • Add getTokenInformationAtPosition method for fast positional token polyfill: vscode.TokenInformation.
  • Add getScopeInformationAtPosition method to get Textmate token data: TextmateToken.
  • Add getScopeRangeAtPosition method to get token range: vscode.Range.
  • Add getLanguageConfiguration method for language configuration: LanguageDefinition.
  • Add getGrammarConfiguration method to get language grammar wiring: GrammarLanguageDefinition.
  • Add getContributorExtension method to get extension source of language ID: vscode.Extension.

Please star the project on GitHub if you think there is further use you could make of it.

@zm-cttae
Copy link

@alexdima seeing as this has been solved by an external library and the internal proposed API, will this be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api feature-request Request for new features or functionality tokenization Text tokenization
Projects
None yet
Development

No branches or pull requests