Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Named macro capture groups #3649

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

tgross35
Copy link
Contributor

@tgross35 tgross35 commented May 29, 2024

This RFC proposes optional names for repetition groups in macros:

macro_rules! foo {
    ( $group1( $a:ident ),+ ) => {
        $group1( println!("{}", $a); )+
    }
}

Rendered

Small Pre-RFC: https://internals.rust-lang.org/t/pre-rfc-named-capture-groups-for-macros/20883

@tgross35 tgross35 changed the title Add an initial RFC for named macro capture groups [RFC] Named macro capture groups May 29, 2024
@tgross35

This comment was marked as resolved.

@rustbot rustbot added A-macros Macro related proposals and issues T-lang Relevant to the language team, which will review and decide on the RFC. labels May 29, 2024
@RalfJung
Copy link
Member

RalfJung commented May 29, 2024

Aren't there still some subtle questions around nesting one has to worry about? Specifically, the RFC doesn't seem to say anything about nested cases like ( $group1:( $a:ident [ $group2:( $b:ident ),+ ] ),+ ). Presumably I will only be allowed to use $group2 inside a $group1(...) repetition? But there can be arbitrarily many other repetitions "in between" -- group2 will always refer to the repetition instance in the "current" group1?

@davidhewitt
Copy link
Contributor

Although I cannot remember the exact use-case, I remember wanting something like this to name an optional trailing comma so that I could forward it on to another macro as part of the expansion. 👍

@tgross35
Copy link
Contributor Author

Aren't there still some subtle questions around nesting one has to worry about? Specifically, the RFC doesn't seem to say anything about nested cases like ( $group1:( $a:ident [ $group2:( $b:ident ),+ ] ),+ ). Presumably I will only be allowed to use $group2 inside a $group1(...) repetition? But there can be arbitrarily many other repetitions "in between" -- group2 will always refer to the repetition instance in the "current" group1?

Great point, the scope part is tricky - I think what you said is correct. It is probably best if I just add an appendix of examples that do and don't work with reasoning, especially given your comments on Zulip about the existing macro mental model not being very clear (I completely agree).

@RalfJung
Copy link
Member

RalfJung commented May 29, 2024 via email

Comment on lines 51 to 53
( $group1:( $a:ident ),+ ) => {
$group1( println!("{}", $a); )+
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you write this I thought you are defining a metavariable $group1 which matches a comma-separated list of $a:ident, and using $group1 will expand to foo,bar,baz,etc.

but no the $group1 is just a label 🤔

the rationale mentioned using : is to be "to be more consistent with existing fragment specifiers" but IMO similarity with fragment specifiers is exactly why this RFC should not use the current syntax.

i prefer getting rid of : (my main source of confusion) and use a label syntax like

macro_rules! make_functions {
    ( 
        names: [ $'names $($name:ident),+ ],
//               ^~~~~~~~~
        $'greetings $( greeting: $greeting:literal, )?
    ) => {
        $'names $(
            fn $name() {
                println!("function {} called", stringify!($name));
                $'greetings $(println!("{}", $greeting) )?
            }
        )+
    }
}


macro_rules! innermost1 {
    ( $'outer_rep $($a:ident: $'inner_rep $($b:literal),* );+ ) => {
        [$'outer_rep $( $'inner_rep $( ${index($'outer_rep)}, )* )+]
    };
}

Copy link
Contributor

@petrochenkov petrochenkov May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metavariables matching some parts of the input are an interesting idea in general.
If we also introduce "match exactly once repetitions", then we'll also be able to capture just arbitrary token streams.
It would be nice to also be able to bind a name to "repeated exactly one" groups aka just artibtrary substreams in the input.

// macro LHS, capture "exactly once" repetition
$my_tokens:(a b $var c d)// macro RHS, reemit `a b value_for_var c d`
$my_tokens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ideas - I updated the default syntax to be $group1(...) (no colon) and added the label syntax as a possibility.

It would be nice to also be able to bind a name to "repeated exactly one" groups aka just artibtrary substreams in the input.

Is there a standard kleene for a single capture, or could we get by with no kleene at all? I could probably include a proposal for single captures in this RFC, to be accepted as part of it or rejected separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In regexes, a "capture exactly once" group is just (...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After writing some more examples, it seems like "capture once" and "emit the entire capture" would be pretty useful, so I just included them as part of the proposal. Especially for code that just needs to be validated and passed elsewhere, or matching and reemitting optional keywords/tts like mut, & or pub(crate) (just allowing capture groups that don't contain metavars is enough for that, but $pub_crate in the expansion is nicer to read than $pub_crate(pub(crate))).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One advantage of the reemitting is that it will use spans from the input, instead of spans from the macro body.

One example I recently observed in rust-lang/rust#119412 is the mir! macro in standard library.
It will match on something like let $expr = $expr; and then reemit $expr = $expr; as an assignment expression.
If = is emitted manually from the macro it will get the macro span and may mess up the combined span of the assignment expression.
If = is taken from the input, then the expression will get a precise span from the input, which is good for any kind of DSLs.

@c410-f3r
Copy link

cc @markbt

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skimmed this: I feel positive on this direction.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jul 27, 2024
…r-e2024, r=petrochenkov

Make `missing_fragment_specifier` an error in edition 2024

`missing_fragment_specifier` has been a future compatibility warning since 2017. Uplifting it to an unconditional hard error was attempted in 2020, but eventually reverted due to fallout.

Make it an error only in edition >= 2024, leaving the lint for older editions. This change will make it easier to support more macro syntax that relies on usage of `$`.

Fixes <rust-lang#40107>

---

It is rather late for the edition but since this change is relatively small, it seems worth at least bringing up. This follows a brief [Zulip discussion](https://rust-lang.zulipchat.com/#narrow/stream/268952-edition/topic/.60.20DBD.20-.3E.20hard.20error) (cc `@tmandry).`

Making this an edition-dependent lint has come up before but there was not a strong motivation. I am proposing it at this time because this would simplify the [named macro capture groups](rust-lang/rfcs#3649) RFC, which has had mildly positive response, and makes use of new `$` syntax in the matcher. The proposed syntax currently parses as metavariables without a fragment specifier; this warning is raised, but there are no errors.

It is obviously not known that this specific RFC will eventually be accepted, but forbidding `missing_fragment_specifier` should make it easier to support any new syntax in the future that makes use of `$` in different ways. The syntax conflict is also not impossible to overcome, but making it clear that unnamed metavariables are rejected makes things more straightforward and should allow for better diagnostics.

`@Mark-Simulacrum` suggested making this forbid-by-default instead of an error at rust-lang#40107 (comment), but I don't think this would allow the same level of syntax flexibility.

It is also possible to reconsider making this an unconditional error since four years have elapsed since the previous attempt, but this seems likely to hit the same pitfalls. (Possibly worth a crater run?)

Tracking:

- rust-lang#128143
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 27, 2024
…e2024, r=petrochenkov

Make `missing_fragment_specifier` an error in edition 2024

`missing_fragment_specifier` has been a future compatibility warning since 2017. Uplifting it to an unconditional hard error was attempted in 2020, but eventually reverted due to fallout.

Make it an error only in edition >= 2024, leaving the lint for older editions. This change will make it easier to support more macro syntax that relies on usage of `$`.

Fixes <rust-lang#40107>

---

It is rather late for the edition but since this change is relatively small, it seems worth at least bringing up. This follows a brief [Zulip discussion](https://rust-lang.zulipchat.com/#narrow/stream/268952-edition/topic/.60.20DBD.20-.3E.20hard.20error) (cc `@tmandry).`

Making this an edition-dependent lint has come up before but there was not a strong motivation. I am proposing it at this time because this would simplify the [named macro capture groups](rust-lang/rfcs#3649) RFC, which has had mildly positive response, and makes use of new `$` syntax in the matcher. The proposed syntax currently parses as metavariables without a fragment specifier; this warning is raised, but there are no errors.

It is obviously not known that this specific RFC will eventually be accepted, but forbidding `missing_fragment_specifier` should make it easier to support any new syntax in the future that makes use of `$` in different ways. The syntax conflict is also not impossible to overcome, but making it clear that unnamed metavariables are rejected makes things more straightforward and should allow for better diagnostics.

`@Mark-Simulacrum` suggested making this forbid-by-default instead of an error at rust-lang#40107 (comment), but I don't think this would allow the same level of syntax flexibility.

It is also possible to reconsider making this an unconditional error since four years have elapsed since the previous attempt, but this seems likely to hit the same pitfalls. (Possibly worth a crater run?)

Tracking:

- rust-lang#128143
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros Macro related proposals and issues T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants