Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simpler OCaml language grammar #92

Merged
merged 3 commits into from
Apr 2, 2020
Merged

Simpler OCaml language grammar #92

merged 3 commits into from
Apr 2, 2020

Conversation

mnxn
Copy link
Collaborator

@mnxn mnxn commented Apr 2, 2020

As mentioned in #91, I have been working on a simplified version of the OCaml language grammar used for syntax highlighting.

As of now, the existing language grammar has various inconsistencies and issues that make it hard to work with. My implementation uses less context-sensitive highlighting and is much simpler and shorter because of it (500 vs 2000 lines). I have also added comments describing each rule, using terminology from the OCaml manual, which will hopefully make it easier to maintain in the future.

I have followed the official OCaml manual as close as I can, so special syntax is highlighted, including:

  • escaped character sequences in characters/strings
    • standard sequences (\\, \n, \t, etc.)
    • decimal/hexadecimal/octal ASCII codes
    • unicode escape sequence (in strings only)
  • printf format strings (https://caml.inria.fr/pub/docs/manual-ocaml/libref/Printf.html)
    • including flags, width, precision, and type
  • floating point literals
    • decimal with exponent
    • hexadecimal with exponent part
  • integer literals (int/int32/int64/nativeint)
    • decimal
    • hexadecimal
    • octal
    • binary
  • operators
    • infix symbols
    • prefix symbols
    • monadic let operators (let+, and+, etc.)

A limitation of the approach I took is that capital identifiers are all highlighted the same (constructors, exceptions, modules); type parameters ('a) and builtin types are the only highlighted types.

I also added a few context sensitive highlighting (see the bindings patterns) that seem to work consistently, unlike the existing grammar.

My intention for this syntax is that it will be a base for incremental improvements in the future. Due to the inherit complexities of the official OCaml grammar, it may be worthwhile to look at tree-sitter or LSP semantic highlighting (proposal: microsoft/vscode-languageserver-node#367) in the future.

Comparison:

Old New
ml_old ml_new
obj_old obj_new

For an example of two existing problems that this fixes, I provide the following examples.
For the let operators in #45:

Old New
letops_old letops_new

For the Menhir highlighting problem I mentioned in #91:

Old New
menhir_old menhir_new

I know this is a lengthy PR, but hopefully I communicated my justifications clearly. If you have any questions, I'll be happy to answer.

@smorimoto
Copy link
Collaborator

The code looks good. I have the same thought as you, and the existing code was kind of a placeholder. So I really appreciate that you did this.

@rgrinberg
Copy link
Contributor

Thank you for your contribution. This looks much better indeed.

@rgrinberg rgrinberg merged commit a6d68c5 into ocamllabs:master Apr 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants