Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose paren Position on DefStmt #504

Merged
merged 1 commit into from
Sep 11, 2023

Conversation

raphaelvigee
Copy link
Contributor

No description provided.

@google-cla
Copy link

google-cla bot commented Sep 6, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@adonovan
Copy link
Collaborator

adonovan commented Sep 6, 2023

What's the motivation for this change? The syntax tree doesn't expose the positions of most tokens, as they are redundant given the formatting algorithm. Why are these token positions necessary, but not all the other ones?

@raphaelvigee
Copy link
Contributor Author

The L/R paren are generally exposed (see Dict, List, Tuple)
I'm writing a formatter, and needed access to those position to make certain decisions

@adonovan
Copy link
Collaborator

adonovan commented Sep 7, 2023

The L/R paren are generally exposed (see Dict, List, Tuple)

That's not really an accurate characterization of the current design. The syntax tree records positions only for tokens that are required to accurately compute the span (start and end positions) of the tree, as in your examples of Dict, List, and Tuple. It also records the positions of interior tokens that are semantically important, such as the positions of + (add) and ( (call) operators, which are needed in error messages. But it doesn't record positions of interior tokens that are trivial and can be reconstructed by a canonical formatter, such as commas. The paren of a DefStmt seems to fall into this category, as DefStmt already has two tokens (def, name) that suffice for error messages, and the paren is an interior token that a formatter canonically places immediately after the function name.

I'm writing a formatter, and needed access to those position to make certain decisions

For what decisions does your formatter need this information?

@raphaelvigee
Copy link
Contributor Author

raphaelvigee commented Sep 7, 2023

The syntax tree records positions only for tokens that are required to accurately compute the span

I would argue that the position of ) is required to correctly compute the span of the parameters, which is what i m interested in here

the paren is an interior token that a formatter canonically places immediately after the function name.

I agree with you for the (, but the ) can be on the same line as (, or a couple lines down, depending on the paremeters whether they are layed down in one line, or multiple.

For what decisions does your formatter need this information?

We gonna go a bit deep here:

Consider the following pieces of code:

def blah(
   arg1 = []
   # some comment
):
   pass

When parsing that code, the # some comment gets attached as a Before to pass, so i use the position of () and the position of the comment to hoist the comment back into the correct developer-intended place. The logic being that if a comment.Line is > (.Line and < ).Line, then its the last "arg" of the parameters

@adonovan
Copy link
Collaborator

adonovan commented Sep 7, 2023

Thanks for illuminating. I think you are pointing out a general deficiency of the current syntax tree, which is that it doesn't contain sufficient information to preserve the relative order of tokens and comments. (Go's go/ast has a similar problem, notoriously; see golang/go#20744). If we fixed it in this instance, I suspect you would still encounter similar problems elsewhere in the tree.

How many other places would we need to record token positions in order to preserve token/comment order in all "reasonable" cases? (An unreasonable comment would be one before a comma, say.)

@raphaelvigee
Copy link
Contributor Author

Understood.
I wrote a formatter that seems to work for all the cases of comments i can imagine (although i probably missed a bunch), the issue highlighted above happens for all L/R list-ish nodes (Dict, List, Tuple, Call), they all expose the paren, so i could implement the algorithm described above, only DefStmt didnt expose it.
If you are curious to see that contraption

@raphaelvigee
Copy link
Contributor Author

raphaelvigee commented Sep 7, 2023

But I would say that fixing those deficiencies would be pretty straightforward: add to syntax.Comment another field: Last // used in Dict, List, Tuple, Call, stores arguments before the closing paren

@adonovan
Copy link
Collaborator

Alright, let's add these two tokens. I noticed that the most popular Python formatters appear to preserve the position of comments after the last parameter.

@adonovan adonovan merged commit 95963e0 into google:master Sep 11, 2023
7 checks passed
@laurentlb
Copy link
Contributor

I'm writing a formatter, and needed access to those position to make certain decisions

Is there any reason for not using Buildifier?
https://github.com/bazelbuild/buildtools/tree/master/buildifier

@raphaelvigee
Copy link
Contributor Author

raphaelvigee commented Sep 25, 2023

Mostly wanted single tool, also it was pretty fun to write (except the part where the comments end up in random places)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants