Expose paren Position on DefStmt #504

raphaelvigee · 2023-09-06T10:59:04Z

No description provided.

google-cla · 2023-09-06T10:59:07Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

adonovan · 2023-09-06T13:27:53Z

What's the motivation for this change? The syntax tree doesn't expose the positions of most tokens, as they are redundant given the formatting algorithm. Why are these token positions necessary, but not all the other ones?

raphaelvigee · 2023-09-06T17:22:44Z

The L/R paren are generally exposed (see Dict, List, Tuple)
I'm writing a formatter, and needed access to those position to make certain decisions

adonovan · 2023-09-07T16:30:06Z

The L/R paren are generally exposed (see Dict, List, Tuple)

That's not really an accurate characterization of the current design. The syntax tree records positions only for tokens that are required to accurately compute the span (start and end positions) of the tree, as in your examples of Dict, List, and Tuple. It also records the positions of interior tokens that are semantically important, such as the positions of + (add) and ( (call) operators, which are needed in error messages. But it doesn't record positions of interior tokens that are trivial and can be reconstructed by a canonical formatter, such as commas. The paren of a DefStmt seems to fall into this category, as DefStmt already has two tokens (def, name) that suffice for error messages, and the paren is an interior token that a formatter canonically places immediately after the function name.

I'm writing a formatter, and needed access to those position to make certain decisions

For what decisions does your formatter need this information?

raphaelvigee · 2023-09-07T19:40:35Z

The syntax tree records positions only for tokens that are required to accurately compute the span

I would argue that the position of ) is required to correctly compute the span of the parameters, which is what i m interested in here

the paren is an interior token that a formatter canonically places immediately after the function name.

I agree with you for the (, but the ) can be on the same line as (, or a couple lines down, depending on the paremeters whether they are layed down in one line, or multiple.

For what decisions does your formatter need this information?

We gonna go a bit deep here:

Consider the following pieces of code:

def blah(
   arg1 = []
   # some comment
):
   pass

When parsing that code, the # some comment gets attached as a Before to pass, so i use the position of () and the position of the comment to hoist the comment back into the correct developer-intended place. The logic being that if a comment.Line is > (.Line and < ).Line, then its the last "arg" of the parameters

adonovan · 2023-09-07T20:03:49Z

Thanks for illuminating. I think you are pointing out a general deficiency of the current syntax tree, which is that it doesn't contain sufficient information to preserve the relative order of tokens and comments. (Go's go/ast has a similar problem, notoriously; see golang/go#20744). If we fixed it in this instance, I suspect you would still encounter similar problems elsewhere in the tree.

How many other places would we need to record token positions in order to preserve token/comment order in all "reasonable" cases? (An unreasonable comment would be one before a comma, say.)

raphaelvigee · 2023-09-07T21:22:32Z

Understood.
I wrote a formatter that seems to work for all the cases of comments i can imagine (although i probably missed a bunch), the issue highlighted above happens for all L/R list-ish nodes (Dict, List, Tuple, Call), they all expose the paren, so i could implement the algorithm described above, only DefStmt didnt expose it.
If you are curious to see that contraption

raphaelvigee · 2023-09-07T21:52:23Z

But I would say that fixing those deficiencies would be pretty straightforward: add to syntax.Comment another field: Last // used in Dict, List, Tuple, Call, stores arguments before the closing paren

adonovan · 2023-09-11T22:26:53Z

Alright, let's add these two tokens. I noticed that the most popular Python formatters appear to preserve the position of comments after the last parameter.

laurentlb · 2023-09-25T14:04:32Z

I'm writing a formatter, and needed access to those position to make certain decisions

Is there any reason for not using Buildifier?
https://github.com/bazelbuild/buildtools/tree/master/buildifier

raphaelvigee · 2023-09-25T19:39:40Z

Mostly wanted single tool, also it was pretty fun to write (except the part where the comments end up in random places)

Expose paren Position on DefStmt

142a13b

raphaelvigee force-pushed the syntax-expose-token branch from b71494f to 142a13b Compare September 6, 2023 11:02

adonovan approved these changes Sep 11, 2023

View reviewed changes

adonovan merged commit 95963e0 into google:master Sep 11, 2023
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose paren Position on DefStmt #504

Expose paren Position on DefStmt #504

raphaelvigee commented Sep 6, 2023

google-cla bot commented Sep 6, 2023

adonovan commented Sep 6, 2023

raphaelvigee commented Sep 6, 2023

adonovan commented Sep 7, 2023

raphaelvigee commented Sep 7, 2023 •

edited

Loading

adonovan commented Sep 7, 2023 •

edited

Loading

raphaelvigee commented Sep 7, 2023

raphaelvigee commented Sep 7, 2023 •

edited

Loading

adonovan commented Sep 11, 2023

laurentlb commented Sep 25, 2023

raphaelvigee commented Sep 25, 2023 •

edited

Loading

Expose paren Position on DefStmt #504

Expose paren Position on DefStmt #504

Conversation

raphaelvigee commented Sep 6, 2023

google-cla bot commented Sep 6, 2023

adonovan commented Sep 6, 2023

raphaelvigee commented Sep 6, 2023

adonovan commented Sep 7, 2023

raphaelvigee commented Sep 7, 2023 • edited Loading

adonovan commented Sep 7, 2023 • edited Loading

raphaelvigee commented Sep 7, 2023

raphaelvigee commented Sep 7, 2023 • edited Loading

adonovan commented Sep 11, 2023

laurentlb commented Sep 25, 2023

raphaelvigee commented Sep 25, 2023 • edited Loading

raphaelvigee commented Sep 7, 2023 •

edited

Loading

adonovan commented Sep 7, 2023 •

edited

Loading

raphaelvigee commented Sep 7, 2023 •

edited

Loading

raphaelvigee commented Sep 25, 2023 •

edited

Loading