Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture values from enum tokens #227

Closed
optozorax opened this issue May 11, 2020 · 6 comments · Fixed by #245
Closed

Capture values from enum tokens #227

optozorax opened this issue May 11, 2020 · 6 comments · Fixed by #245
Labels
feature Something not supported that perhaps should be

Comments

@optozorax
Copy link

optozorax commented May 11, 2020

#[derive(Clone)]
enum MyToken {
    OpenBracket,
    CloseBracket,
    Int(i64),
    Ident(String),
}

peg::parser!{ grammar list_parser() for [MyToken] {
    use MyToken::*;

    rule list() -> Vec<i64>
      = [OpenBracket] [Int(inner)] [CloseBracket] { vec![/* inner */] }
}}

How can I get inner?

@optozorax
Copy link
Author

Ugly hack:

peg::parser!{ grammar list_parser() for [MyToken] {
    use MyToken::*;

    rule main() -> Vec<i64>
        = [OpenBracket] a:int() [Comma] b:int() [CloseBracket] { vec![a, b] }

    rule int() -> i64
        = n:$[Int(_)] { 
            match n[0] {
                Int(inner) => inner,
                _ => unreachable!(),
            }
        }
} }

@kevinmehall
Copy link
Owner

Thanks, I'm surprised there wasn't already an issue open for this. The syntax in your first comment is what I'd like too but isn't supported yet.

Currently a PEG expression expands to a Rust expression, which means any variables captured are not in scope at the position of the action block. That is, [Int(n)] expands to:

match input.parse_elem(pos) {
  RuleResult::Matched(npos, Int(n)) => RuleResult::Matched(npos, ())
  _ => RuleResult::Failed
}

To keep the variable in scope, instead of returning RuleResult::Matched, it would have to nest the remaining part of the sequence and the action code block within that match arm.

An alternative syntax I've considered that would be easier to support but not as nice would be n:[Int(n) => n] which would simply replace the () for the return value in the existing expansion. That does preserve the fact that n: is the only way to define a variable, but I'm not sure that's an important property.

@kevinmehall kevinmehall added the feature Something not supported that perhaps should be label May 13, 2020
@kevinmehall kevinmehall changed the title How to deal with enum tokens? Capture values from enum tokens May 13, 2020
@BartMassey
Copy link

The alternative syntax seems great. Easier to support, clearer, more flexible. For example, looks like it would support

_:[Pair(m, n) => (m, n)]

Could require _ as the name when using => to avoid some confusion, with only a little extra work.

A third alternative would be to use an @ binding for everything, which would get this case and others just fine, I think: expand n:[pat] to

match input.parse_elem(pos) {
  RuleResult::Matched(npos, n@pat) => RuleResult::Matched(npos, n)
  _ => RuleResult::Failed
}

Would you take a PR for this?

In any case, it would be great to add a note to the rustdoc until this is fixed. Wasted a couple of hours tonight before I found this issue.

Thanks much for an interesting crate!

@kevinmehall
Copy link
Owner

You'd have to do x:[Pair(m, n) => (m, n)], and then access x.1 and x.2 with my second option -- the names would still only be bound within the match that the [pat] expands to, and the expression after => would be returned and bound to x for use in the subsequent action block. That additional verbosity when dealing with multiple variables is pretty significant drawback of that option.

For the third option, I think you're basically suggesting making [pat] return the element of the input that was matched by the pattern, which could be captured by the n: part. We could do that. I think the reason that [] was made to not return anything originally was to avoid allocating a Vec<char> with very common patterns like ['a'..='z']*. I've since added code to avoid building even the Vec<()> when the result is never used. If it was implemented with @, it would preclude also implementing option 1 until Rust stabilizes bindings_after_at but I think it could be done without using that. One question would be whether it should copy the element (which would make sense for most simple Copy element types) or return a reference (since we don't currently require that ParseElem::Element is Copy).

@BartMassey
Copy link

Been thinking about this for a while. I think Option 0 (make the match available to the action) is the only way that can make sense. One could also qualify the match in the description with variables it must match, and check for them, if desired.

I looked a providing a PR and decided the difficulty was above what I could handle. Thanks for looking at this!

kevinmehall added a commit that referenced this issue Jan 10, 2021
This changes the expansion of pattern expressions when used in a labeled
sequence such that the remainder of the sequence is lexically within the
match arm so that the variable is in scope.

Literal expressions are also included for the sake of generating nicer
code by not generating RuleResult in a `match` that is itself
immediately matched upon.

Fixes #227
kevinmehall added a commit that referenced this issue Feb 15, 2021
This changes the expansion of pattern expressions when used in a labeled
sequence such that the remainder of the sequence is lexically within the
match arm so that the variable is in scope.

Literal expressions are also included for the sake of generating nicer
code by not generating RuleResult in a `match` that is itself
immediately matched upon.

Fixes #227
@BartMassey
Copy link

Missed this earlier — apologies for not noticing. Thanks much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Something not supported that perhaps should be
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants