Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation Pipeline Improvements (edited) #25

Open
gnumonik opened this issue Mar 22, 2024 · 0 comments
Open

Compilation Pipeline Improvements (edited) #25

gnumonik opened this issue Mar 22, 2024 · 0 comments

Comments

@gnumonik
Copy link
Collaborator

gnumonik commented Mar 22, 2024

Context / Motivation

NOTE: I'm editing an old issue instead of making a new one.

In my rush to get the full pipeline working ASAP, I ended up doing a few things that, in hindsight, should have been done differently:

  • Bound is great but we convert to a Bound AST at the wrong place in the pipeline. As things are, we convert to a Bound representation after inlining and monomorphization, thereby losing the benefits Bound provides (i.e. perfectly safe capture-avoiding substitution). Since these benefits would be particularly helpful during monomorphization and inlining, we should convert to Bound as soon as humanly possible.
  • Apps in our CoreFn do not need to be explicitly typed, and keeping the types correct in them is a huge headache with no real benefits. We should just remove those annotations and calculate the type from the "function part" and the "argument part".
  • Case expression desugaring is a nightmare ATM because we go from CoreFn style case analysis to PIR style case analysis at the same time we do the final conversion to PIR. It probably isn't obvious what I mean, so let me explain. This is (more or less) what the CoreFn representation of a case alternative is:
    • (case (constr 1 x y) (alt x t1) (alt x y t2))
    • This is the PIR representation: (case (constr 1 a b) (\x -> t1) (\x y -> t2))
    • The main issue here is that we have to know the types of the variables bound by each case alternative branch before we can construct the branch in PIR, else we can't make the types line up. Determining that type ahead of time would greatly simplify the final conversion process, and we should do it.

The Ask

Need to explain how things currently work real quick. The rough pipeline of transformations is:

1. PureScript Parser 
2. CST 
3. AST 
4. PS TypeChecker 
5. PS Desugarer 
6. Convert to Typed CoreFn (w/ type deduction) 
7. Monomorphizer/Inliner
8. Convert to Bound AST / Desugar Objects
9. Convert to PIR
10. Compile to UPLC

I'm proposing that after step 6, we convert to another Bound AST representation which is (effectively) isomorphic to Typed CoreFn. That way, we can do safe and performant substitutions in the monomorphizer/inliner.

We'll still have the Bound AST in step 8 (which gives us a "Parse-don't-validate" approach to ensuring that expressions and types that cannot be converted to PLC are desugared and removed).

What I'm proposing isn't exactly trivial. There is a lot of boilerplate required for Bound to work. For a given Expr ADT, each component type which is not ~ Expr requires several somewhat complicated type class instances that cannot be derived (well, maybe I can fix the TH in deriving-compat to work for them, but cannot be derived easily at least).

But it will give us immense benefits both in safety (since it becomes impossible to screw up capture avoiding subsitution) and performance.

Acceptance Criteria

How can we decide if the task is complete?

The new AST type is implemented and existing utilities that operate on CoreFn are modified to operate on BoundCoreFn.

@gnumonik gnumonik changed the title Convert to 'Bound' representation earlier in the compilation pipeline Compilation Pipeline Improvements (edited) Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant