Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give access to the AST and custom TS queries for other extension developpers #15

Open
w-cantin opened this issue Nov 23, 2021 · 9 comments
Labels
feature-request Request for new features or functionality

Comments

@w-cantin
Copy link

Hello,

This is a feature request that could make other extension developers quite happy and solve quite a few issues related to interacting with source code. It would be great if there were API endpoints allowing other developers to make custom queries on the parsed Treesitter tree and thus allow them to navigate the AST more freely.

At the moment if someone decided to do relatively simple things like:

all require extension developers to make their own custom parsers only to reimplement existing VSCode's functionalities. This is probably unnecessary duplication and the end result is often worse than the existing VSCode's implementation. Unfortunately at the moment the VSCode's API does not give a proper access to the AST and tokens produced during parsing (this github issue elaborates on the subject). There is a useful command called vscode.executeDocumentSymbolProvider but it is not enough and does not always categorize symbols properly: see this issue

Treesitter and Anycode could probably solve this problem by parsing the tree once on document changes and then giving it access to other extensions to query on their own. Would this be possible?

Thank you

@jrieken jrieken added the feature-request Request for new features or functionality label Nov 23, 2021
@jrieken
Copy link
Member

jrieken commented Nov 23, 2021

Treesitter and Anycode could probably solve this problem by parsing the tree once on document changes and then giving it access to other extensions to query on their own. Would this be possible?

This idea is floating around and something to be considered. Tho, it isn't top priority atm and it's also noteworthy that anycode uses tree-sitter in a dedicated worker context (lsp server). This makes API hard because there is no shared memory space between that context and the place in which extensions run. Anyways, we do see anycode as a first step into a potentially larger involvement with tree-sitter

@w-cantin
Copy link
Author

If an elegant solution were to be found and sent in a PR, would it be accepted? If so I might start looking into this. If not then I guess an alternative would be to create a new extension to do similar things. The biggest issue with this solution would be the code duplication and the excess resources used by the client to parse multiple times the same documents.

@jrieken
Copy link
Member

jrieken commented Nov 23, 2021

Sure, we are always keen on elegant or pragmatic solutions. We can also use this issue to discuss and brainstorm. In a way, it sounds very compelling to share TS syntax trees but it's also a no trivial undertaking. There is the different context-challange that happens witth LSP server but also with multiple extension hosts. Also, TS syntax trees are "wasm objects" which need to be free'd manually. It would need a good, easy to handle wrapper around those objects. The alternative would some higher level API, like only allow to execute TS queries or have API for specific features, like "is this position inside a string" etc... Maybe just allowing to execute TS queries is good middle-ground

@w-cantin
Copy link
Author

That is great to hear! I can't promise anything yet but I'll start looking into it more closely and see what comes out. The lsp server is definitely gonna be a challenge and having some sort of wrapper around queries seems to be the most promising options.

Having some higher level API functions for specific features like "is this position inside a string" would probably not be versatile enough to handle every programming language's syntax and special quirks. For example, only in Javascript it is possible to create strings 3 ways: with double quotes ("), single quotes (') and using string templates (`). If an extension wanted to treat those 3 cases separately but Anycode's API treated them all as the same thing then the extension developer would be out of luck. On the other hand, if Anycode decided to handle each use cases separately then it could quickly become out of hand to try and handle every language's way of doing things.

Allowing custom queries to be made then seems like a better alternative. At the bare minimum the public API would probably have those functions:

  • a function that lists all the supported programming languages

  • a way for an extension to register its own queries for the languages it wants to support. Those queries could probably contain TS capture names using the @ syntax so that they can be used later.

  • once the queries are created, a way to interact with the parsed TS tree of a specific document. It could be as simple making a request for a specific document uri and a list of capture names. The response would then contain all the nodes that match the capture names in the document. The type of the returned nodes does not necessarily have to be a Parser.Tree instance if it becomes too annoying to deal with and maybe something similar to a list of Document Symbols might well be sufficient.

The document parsing, object creations and destructions could be handled on the server side. That would give a lot of freedom and versatility for the extension developer with only a few available function calls.

@jrieken
Copy link
Member

jrieken commented Nov 25, 2021

Yeah, sounds all reasonable and you are right query result must not be TS specific types as they don't serialize well. Something JSON'able should be used instead.

@CombeeMike
Copy link

Are there any updates on this?

I'm currently writing a plugin which specifically needs the 3 things mentioned in the initial post:

  • Is pos inside a string, function or surrounded by (any kind of) brackets (a "code block" to be more precise, e.g. brackets within a string literal should not match etc.)
  • If so, get start & end pos of that string, function or surrounding brackets

For functions, I'm already using the mentioned vscode.executeDocumentSymbolProvider which unfortunately does not recognize arrow functions as such (by design).

@w-cantin
Copy link
Author

Hello,

Unfortunately on my part I was not very successful in creating a working prototype. I have not looked into this problem for a while though so maybe someone else came up with a good solution.

I should probably warn you about the DocumentSymbols API. Back when I was toying with DocumentSymbolsProvider it seemed fine at first but unfortunately in practice it not very coherent across languages and caused more troubles than anything. Even in well supported languages like JS or Python the AST created by the Language Server Extensions would not always be accurate or be parsed the same way resulting in different behaviors. For example sometimes methods inside classes would be recognized as functions, but sometimes not and would be considered a Property instead depending on which programming languages was used. Unfortunately this part of the API is not very well standardized and seems a little bit like an afterthought for Language Server developers.

Good luck to you

@CombeeMike
Copy link

@w-cantin Thanks for the info!

This plugin is mainly intended for personal use (at least for now), so I can live with the shortcomings of executeDocumentSymbolProvider. Still better than not having the functionality at all or having to implement the "function parsing" myself 😉

However, it is good to know, that this is not very reliable. If I ever publish this publicly, I'll probably add an appropriate exclaimer warning about this 🤷‍♂️

@zm-cttae-archive
Copy link

zm-cttae-archive commented Feb 25, 2023

Related - microsoft/vscode#580 (comment) (Textmate tokens):

  • Works in the browser
  • Gets you every document token quite quickly (well-cached, fast algos & okay dependency loading)
  • Can produce symbols and folds etc if you so need
  • Is configurable
  • Will support custom textmate-languageservice-contributes wiring for a language next week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Request for new features or functionality
Projects
None yet
Development

No branches or pull requests

4 participants