Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Intermediate Representation #29

Open
gnumonik opened this issue Apr 20, 2024 · 0 comments
Open

Better Intermediate Representation #29

gnumonik opened this issue Apr 20, 2024 · 0 comments
Assignees

Comments

@gnumonik
Copy link
Collaborator

Context

At the moment, the Purus compilation pipeline makes use of three different representations of the AST:

  1. The Language.PureScript.AST syntax tree (mainly defined in Language.PureScript.AST.Declarations), which is constructed after parsing source files into a concrete syntax tree. This closely mirrors the structure of written source files (e.g. do-notation isn't yet desugared, typeclass constraints have not been replaced with explicit dictionaries, etc). We don't modify this at all, and the only place where we interact with it is in the machinery that converts it to the CoreFn representation. IMPORTANT: The PureScript type checking & inference machinery operate on this representation, and we begin processing it after typechecking has completed.
  2. The CoreFn AST (mainly defined in Language.PureScript.CoreFn.Expr) is the second representation. Our CoreFn is not vanilla PS CoreFn, because our version contains explicit type annotations. At this stage, do notation expressions are desugared to explicit uses of bind, if-then-else expressions have been converted to case expressions, etc. At the moment, this is the format for serialized Purus modules. IMPORTANT: The renamer operates on CoreFn. Renaming is (afaict) the only significant transformation that the vanilla PureScript compiler performs on CoreFn.
  3. The "Final" IR (mainly defined in Language.PureScript.CoreFn.Convert.IR) is the last representation we use before conversion to PIR. This representation only contains (or should only contain) expressions, declarations, and types that are isomorphic to PIR expressions, and is intended as a "parse-don't-validate" sanity check on the previous compiler steps. Here (but not in any previous step), we use a restricted Type type such that (e.g.) row-types are disallowed. Most notably, this IR contains no mention of records or objects, forcing us to desugar record/object expressions into product expressions. Additionally, this AST makes use of the Bound library for safe and efficient substitutions.

One final note: PS identifiers are (for the most part) just strings (well, `Text). The PIR compiler expects identifiers to have a textual name and a "unique" integer identifier. (See the discussion on slack in the IOG channel where MPJ & Yura explain how that works for more context). At the moment we assume that input to the PIR compiler has well-scoped and unique names, and so we just maintain a dictionary of names w/ a stateful counter. It might be beneficial to explicitly tag names with unique integer identifiers earlier on in the pipeline. (AFAICT these identifiers don't have to be De Bruijn indices, though they could be).

Motivation

There is, at present, too large of a gap between CoreFn and the final representation:

  • CoreFn contains record types and expressions (object updates/literals/accessors), which may mention type variables that have are of kind Row Type. In order to transform CoreFn into the final IR, we must perform monomorphization (at least on expressions with row-polymorphic types).
  • Right now, we do monomorphization with CoreFn directly. This is extremely fragile - During monomorphization, we generate many new identifiers and perform a lot of inlining. Because CoreFn does not have a mechanism for safe substitution, it is very likely that there are subtle bugs in the monomorphizer which will lead to variable name clashes. It's virtually impossible to write a comprehensive set of tests that would catch all possible mistakes here.
  • Due to some of the changes I made (particularly to the guarded case desugarer), the CoreFn AST is too "loose". That is: All guards (everywhere) should have been eliminated. Additionally, all ConstrainedTypes should have been desugared into types that mention explicit dictionaries.
  • Finally, neither CoreFn nor our final IR have, at the moment, a representation of case expressions which neatly mirrors the representation used in PIR. Briefly:
    • PureScript: (case (constr 1 x y) (alt x t1) (alt x y t2))
    • PIR: (case (constr 1 a b) (\x -> t1) (\x y -> t2))
    • See this discussion in CIP-85 and compare with the CoreFn CaseAlternative data type
    • This is important because, at the moment, we do the transformation from PS-style alternatives to PIR-style alternatives during the conversion to PIR. PIR is already a fairly "noisy" and final conversion is already tricky, it becomes nigh-unbearable to try to implement the translation for nontrivial case expressions.

The Ask

We need another intermediate representation. Specifically:

  • This representation must provide a mechanism for safe and efficient substitution. This could be Bound or something close enough to it, just needs to be safe and efficient.
  • We must convert to the new representation immediately after renaming. We can assume that the PS renamer is robust and more-bug-free-than-anything-we'd-write-in-a-week, so we have to do it there to avoid the possibility of introducing our own naming errors in some later transformations.
  • Ideally, the new representation would also represent case alternatives in a PIR-compatible format (I'm not sure what the best way of doing that would be off the top of my head)
  • The new representation must, like our CoreFn, contain explicit type annotations for all nodes that require them. (See the CoreFn AST in the sean/monomorph-cleanup branch - I'm pretty sure I removed all the superfluous annotations).
  • It should furthermore make use of Vanilla PS types i.e. SourceType. Many of the PS compiler's utilities for operating on types are very useful during monomorphization and object desugaring, so while we could winnow the types here, we probably shouldn't.

Acceptance Criteria

Satisfies the requirements in the previous section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants