Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lexer rule to handle operator of format like operator + comment #628

Open
wants to merge 2 commits into
base: sail2
Choose a base branch
from

Conversation

trdthg
Copy link
Contributor

@trdthg trdthg commented Jul 10, 2024

try fix #627

Maybe fixed

the basic rule here is

 A (B A)* (for  odd-number chars)
(B A)*    (for even-number chars)

then use it on both maybe_slash_start operator and no_slash_start operator trying to cover all

but I don't have a good way of checking the rules

some format output like if eq2_comment == /**/ /**/ 1 then { is wrong, will be fixed in my next/other PR


Update

//////              (should be line_comment)

let a = 1 ///       (rust-fmt treat this as error)
	1;

let a = 1 / //      (!current)(slash and comment)
	1;

can't be handled correctly for now


Update

  • fixed with the following rules
     | ((oper_char* oper_char_no_slash) as op) "/*" {
     | ((oper_char* oper_char_no_slash) as op) "//" {
    
  • Add more very very tricky test cases :D

you can push to my branch directly if needed

Copy link

github-actions bot commented Jul 10, 2024

Test Results

    9 files  ± 0     20 suites  ±0   0s ⏱️ ±0s
  663 tests + 6    663 ✅ + 6  0 💤 ±0  0 ❌ ±0 
2 112 runs  +16  2 111 ✅ +16  1 💤 ±0  0 ❌ ±0 

Results for commit be519cf. ± Comparison against base commit 13d9458.

♻️ This comment has been updated with latest results.

@trdthg trdthg marked this pull request as draft July 10, 2024 08:07
@trdthg trdthg marked this pull request as ready for review July 10, 2024 09:07
@Alasdair
Copy link
Collaborator

Test case looks good, I'll take a closer look at this next week and hopefully get it merged.

@trdthg
Copy link
Contributor Author

trdthg commented Jul 22, 2024

  • Add more similar rule for infix, infixl, infixr
  • also add more test case

but they looks very repetitive and boring

and even operator + /*! is not added yet

@Alasdair
Copy link
Collaborator

Alasdair commented Jul 22, 2024

There should be a way to do this with just a slightly more sophisticated regex, and without adding extra lexing rules I think. Using s to represent a safe operator character it would be something like:

([s]*(([/][s])?|([*]+[s])?))*([s]*([/]|[*]+))|[s]

i.e. the logic is something like:

  • Any number of 'safe' characters
  • Until a comment starting character or ending character
  • Which has to then be followed by character that doesn't finish the comment sequence, or the operator ends.

@Alasdair
Copy link
Collaborator

Probably the only way to gain confidence in such a regex would be to test it on all possible operator character sequences up to some bound.

@trdthg
Copy link
Contributor Author

trdthg commented Jul 23, 2024

I've tested your regex with

let s = oper_char_no_slash_star
let non_comment_start = ['/'] oper_char_no_slash_star
let non_block_comment_end = ['*']+ oper_char_no_slash
let safe_start = non_comment_start | non_block_comment_end

let non_comment_end = (s* (['/'] | ['*']+)) | s

let operator = (s* safe_start?)* non_comment_end ('_' ident)?

should be same, the names are only for my to understand

also test on this, with no change:

let s = oper_char_no_slash_star
let operator = 
  (s* 
    (
      (['/'] s)?
      |
      (['*']+ s)?
    )
  )* 
  (s*
    (
      ['/']
      |
      ['*']+
    )
  )
  |
  s

but some test case failed:

  • /=/=/=: can't match non_comment_end part, this should be an lack
  • /=: same as above
  • =//

The rules op "//" and op "/*" I added is only used for interpreting =// as an op = and a line_comment, not an error.

So if you prefer keep it as an error (rust do so), I can just delete the complex rules and only keep the brife let operator = xxx part

and delete some test case

@Alasdair
Copy link
Collaborator

Maybe we can just simplify the rules for operators slightly. We could:

  1. Just say that '/' is forbidden in multi-character operator sequences entirely (as '/' by itself is fine for division)
  2. Only allow a single '/' in operators that do not contain '*' (would permit some operators like </>, which I've seen used in Haskell code)
  3. Forbid operators of length > 3, as anything longer just seems like poor style. Then the space of 'bad' operators involving comment sequences is much easier to define.

@Alasdair
Copy link
Collaborator

I'm leaning towards option 2, as it's the most conservative and is still fairly easy to explain in the manual.

@trdthg
Copy link
Contributor Author

trdthg commented Jul 24, 2024

I can think of the following cases right now:

So I think option 2 is ok, I guess it can cover (maybe) 99% cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Operator's rule makes its parsing disjointed with comment
2 participants