WIP: a re-implementation of the compiler backend #424

zerbina · 2022-08-31T23:16:02Z

Summary

This PR implements:

back-end oriented intermediate representations for procedure bodies, types, symbols, etc. - vmir, irtypes, irliterals
logic for translating post-transf AST to the IRs - irgen, irtypes, cbackend2
transformation and processing logic for the various IRs - irpasses, cpasses, typeinfogen, markergen, typeprocessing
a new C back-end based on the components listed above - cbackend2, cgen2

The overall goal is to design and implement a simple, modular, and data-oriented framework for developing the various back-ends/target with.

The README.md that exists is outdated - a lot of things changed since it was written. It still acts as a good introduction to the code however.

Notes for Reviewers

The PR is quite large. I'd suggest starting with looking at cbackend2.generateCode, and after getting familiar with how it works on a high-level, continue with the components/sub-systems it uses.

There are a lot of maximum line-length violations (especially in code using the IrCursor API). For the most part, these are in parts of the code where I'm not yet happy with the general architecture, and that are thus likely to be rewritten anyway.

I also left a lot of annotations regarding ideas, to-dos, questions, and general problems in the code - feel free to comment on them.

The documentation very likely needs some extra attention. Not all routines are documented yet, and for those that are, I believe that the documentation is sometimes either glossing over too many details or focusing on the wrong things.

An empty body does not mean that the procedure is imported. Check for the presence of the flag instead.

Stores references to the things that make up a module, e.g. procedures, globals, etc.

The `owner` field was only meant as a temporary measure and is going to be removed soon.

Some basic integer operations were missing.

The operators were swapped.

In addition, don't define referenced globals but declare them as `extern`.

The condition to whether overflow-checks should be inserted was inverted, causing `nimDivInt` to recurse infinitely.

'branch' currently means "jump if condition" not "jump if not condition"

Instead of requiring magics calls to be represented via normal procedure calls, they can now be represented directly with `ntkCall` nodes without the need for a `ProcId`. The old approach had a few problems. Since magics procedures can be generic, it's not possible to know all valid instantiations once reaching the backend. To still be able to insert magic calls, a pseudo-proc with no type information was created for each magic and cached in `PassEnv` This meant that `ProcHeader` together with all logic interacting with it had to support procedures with no type information. In addition, the cache for generated magics (`PassEnv`) had to be available wherever magics need to be inserted.

Don't use `PassEnv.magics` anymore. All compiler-inserted magic calls now have return-type information attached. Also fixes `bcOverflowCheck` having the wrong type when wrapped around `mInc|mDec`.

Reap the benefits.

Their initialization logic is still missing however.

It's possible for multiple `PType` instances of the same type instantiation to exist and the previous logic didn't account for it, causing type mismatch issues in the generated C code.

It's important that the unksipped types are passed to `requestType` so that imported types can be handled properly.

The previous logic looked at the type of the *argument*, but the correct way is to look at the type of the target *parameter*. `vmgen` has the same issue.

* remove unused types, procedures, and constants * improve some doc comments * update/remove/add some annotations

Improve some doc comments, remove stale ones, and add some notes about future directions/plans.

* correct some typos * improve some doc comments * remove dead code * leave some annotations about future directions

* correct some typos * improve some doc comments * remove dead code * remove stale annotations * leave some annotations about future directions

haxscramper

Aside from a minor stylistic comment I made, everything else seems to be in order. Sadly, I can hardly promise to provide an in-depth review ATM (IRL stuff), but going from the provided readme I don't think I would have any high-level comments anyway.

haxscramper · 2022-09-30T08:42:43Z

compiler/vm/cbackend2.nim

+  for sym in g.compilerprocs.items:
+    case sym.kind
+    of routineKinds:
+      p.compilerprocs[sym.name.s] = procs.requestProc(sym)


Maybe we should introduce the func symStr(s: PSym): string = s.name.s and use it, instead of continuing to write hundreds of naked field accesses in the code.

saem · 2022-10-01T10:59:24Z

compiler/front/main.nim

    registerPass(graph, cgenPass)

    if {optRun, optForceFullMake} * conf.globalOptions == {optRun} or isDefined(conf, "nimBetterRun"):
      if not changeDetectedViaJsonBuildInstructions(conf, conf.jsonBuildInstructionsFile):
        # nothing changed
        graph.config.notes = graph.config.mainPackageNotes
        return
+  else:
+    graph.config.exc = excGoto # only goto exceptions are support for the new cbackend


Thinking out loud, a smaller PR we could pull out of this one is removing the other exception types.

saem · 2022-10-01T11:03:59Z

compiler/vm/README.md

+#### IR overview:
+* the IR is a linear node-based representation
+* nodes reference each other via indices. A node can only reference nodes coming before it; reference cycles are forbidden
+* it's still undecided if a node may be referenced multiple times


Within the AST proper, no.

I do see this as useful for something like the empty node, but I'm not convinced if this is the right thing.

Use the new `rvmXXX` enum value names instead of the `rsemVmXXX` ones.

saem

I've done a first pass. I can only absorb so much in one go and around irpasses I was really fading.

In terms of how to proceed as a strategy:

we need to shrink dialect problems in devel
unit tests directly against cbackend2 generateCode

My reasoning for the first point:

even with the code working for say refc it's that much more to review and test
CI will be faster, I suspect test cycles are going to be a limiting factor here
it'll simplify sem's output, which will clarify/simplify things further here, and likely cycle again reinforcing each other
we need to do it anyways and it'll bring the goal of getting things merged a lot closer

The reasoning for the second point:

Cyo will be a new language at this point (grammar and semantics), its frontend will be separate and I'd like to reuse the backend and VM
tests will enforce and maintain strong separation
they'll allow for refactoring without having to go through a frontend

Practically, I think it should work by us looking at and discussing CI failures and then:

if you could makes fixes for the things we think should continue to work
I can drop the dialects/whatever... for the things we're going to remove

Simultaneously, for issues you're encountering during implementing fixes:

if you can add some direct unit tests to hold necessary properties in place
I can try to make sem less dumb to make your life easier 🤞🏽

As those lines converge, we end up with all tests passing and an easy merge. Thoughts?

saem · 2022-10-01T11:06:33Z

compiler/vm/README.md

+#### IR overview:
+* the IR is a linear node-based representation
+* nodes reference each other via indices. A node can only reference nodes coming before it; reference cycles are forbidden
+* it's still undecided if a node may be referenced multiple times


I do see this as useful for something like the empty node, but I'm not convinced if this is the right thing.

saem · 2022-10-01T17:27:22Z

compiler/ast/nimsets.nim

-  assert result.len == int(getSize(conf, s.typ))
+  # XXX: requiring the length to fit might help in catching some issues, but
+  #      it's too restrictive
+  assert result.len >= int(getSize(conf, s.typ))


Should it instead introduce a new proc that is relaxed while keeping it strict elsewhere?

saem · 2022-10-01T17:30:16Z

compiler/sem/sighashes.nim

@@ -51,9 +51,9 @@ type
    CoDistinct
    CoHashTypeInsideNode

-proc hashType(c: var MD5Context, t: PType; flags: set[ConsiderFlag])
+func hashType(c: var MD5Context, t: PType; flags: set[ConsiderFlag])


Read this file, then thought:

🎶 won't you take me to... Func-y town🎶 😁

saem · 2022-10-01T17:37:37Z

compiler/vm/README.md

+* nodes reference each other via indices. A node can only reference nodes coming before it; reference cycles are forbidden
+* it's still undecided if a node may be referenced multiple times
+* control-flow is represented via gotos and joins. Instead of storing the index of the corresponding `join` target, a `goto` stores an index (`JoinPoint`) into a list storing the actual IR indices. This is aimed at making IR modification simpler, by removing the need to patch goto targets in the IR directly.
+* there exist a few special experimental gotos (goto-link-back, goto-with-continuation, goto-active-continuation) meant for more efficient `finally` handling


Philosophical remark:

This feels like a theme in my experience.

It starts with a very small set of core primitives that are beautiful and all, great for learning and conveying key concepts. Then for making them actually useful a large set of variants are then introduced to encode constraints information.

saem · 2022-10-01T17:39:38Z

compiler/vm/README.md

+
+Currently also ignores whether or not a hook is trivial and thus replaces the assignment for types that don't actually need/use a `=copy` hook.
+
+#### `refcPass`


Possibly another PR, we drop refc.

As refc support is fully implemented already, I'd be a bit reluctant to remove it either before or as part of this PR and instead would rather remove it after. The implementation is then preserved in the git history at least.

One thing to consider is that the refc, markAndSweep, boehm, and go GC support share almost all of their implementation in the compiler, so only removing refc would not reduce (by a significant amount) the required code/complexity.

Overall, ok we don't have to do it just yet.

As for the other GCs I would drop all of those as well.

saem · 2022-10-01T23:43:31Z

compiler/vm/cbackend2.nim

+  # tables
+  let fakeClosure = genFakeClosureType(env.types, passEnv)
+
+  # XXX: mutable because they need to be swapped in and out of the ``RefcPassCtx``.


More push for the no more refc case.

The issue is not directly with refc, but instead with first-class view-types being unfinished.

Both sequences need to be available to various passes (those using RefcPassCtx), but passing them as parameters is not possible due to the TypedPass interface, and storing them as part of RefcPassCtx directly would be wrong, as they are not something that directly belongs to the context object.

The context object should ideally borrow from the sequences instead (via lent seq in this case), but since that's currently not possible, I've used the swap-in-swap-out idiom as a way to mimic borrowing. Using shallowCopy/.cursor or a pointer don't require the source sequences to be mutable, but they have other downsides.

saem · 2022-10-01T23:44:19Z

compiler/vm/cbackend2.nim

+  var ttc = TypeTransformCtx(graph: passEnv, ic: g.cache)
+  var upc = initUntypedCtx(passEnv, addr env) # XXX: not mutated - should be ``let``
+
+  # XXX: instead of manually figuring out out passes are to be batched


Suggested change

# XXX: instead of manually figuring out out passes are to be batched

# XXX: instead of manually figuring out how passes are to be batched

saem · 2022-10-02T00:05:50Z

compiler/vm/cgen2.nim

@@ -0,0 +1,2450 @@
+## `vmir`-based C code-generator. Separated into two phases:


I really like how conceptually simple this is. : 🎉

saem · 2022-10-02T02:43:37Z

compiler/vm/irgen.nim

+  let scope = p.scopes.pop()
+  p.activeLocals.setLen(scope.firstLocal)
+
+# XXX: the emission of ``ntkLocEnd`` instructions is disabled for now. When


I'm still reading/understanding how this should work, but from a quick glance, the following occurred to me. Since we know there will be a start and stop for each could we:

preallocate two sequences of length equal to the number of locals (could also be a tuple)

location order and sequence order must match

each start/end has a positional offset, treat the offset as the instruction being appended with a ntkLocStart/ntkLocStop

Tangentially, it occurs to me that perhaps a bunch of this dramatically simplifies thanks to CPS + structuring, as local lifetime cannot exceed the lifetime of the continuation, unless passed on (move/copy). The CPS transform should be during the semantic analysis phase as it'll change meaning for a number of things. But that should ease this nonetheless as the shape of "procs" you receive will be very small and regular.

Oh well, that's all one fine day right now. 😅

Since scopes (in the context of lifetimes) aren't directly encoded in the IR, the original idea was to use ntkLocEnd as a way to signal the end-of-life of a location on a control-flow path. This information was meant to be used by the move-analyser and related data-flow analysis, as well as the code-generator for the VM (to simplify register allocation).

Because each local can have more than one associated ntkLocEnd instruction, the sequence idea wouldn't work. Attaching extra out-of-band information to instructions is also a bit problematic right now, as the attachment position have to be adjusted separately on each code modification.

Aside: Instructions might need a stable ID for other things, but because of the additional memory they'd take up, I'm still trying to get around having to introduce them.

With the change of plan (i.e. the injectdestructors rewrite that I'll be working on first), most of the ideas around ntkLocEnd have become obsolete/stale.

Thank you for through explanation.

After coming back to this after some time, I think I get it more than I did before. ntkLocEnd is effectively acting as a "consume" instruction being automatically inserted by the compiler. This needs to be branch aware at present... but also leads me back to thinking it simplifies under CPS. 😅

But don't worry about all that jazz, it's just me thinking out loud.

saem · 2022-10-02T03:47:57Z

compiler/vm/irpasses.nim

+    mapped: seq[TypeId] ## ``IRIndex`` -> mapped type of the expression. A
+      ## mapped type is the type after lowering/transformation.
+
+  TypedPass*[T] = object


TypePassVistor

Clyybber · 2022-10-09T21:53:44Z

compiler/vm/cgen2.nim

+  var b: CAstBuilder
+  b
+
+template buildAst(code: untyped): CAst =


I think
start().....fin() or fin: start().... is better than buildAst: builder.... as it doesn't require knowledge of another template and its builder variable.

Clyybber · 2022-10-09T21:58:07Z

compiler/front/main.nim

@@ -146,22 +148,27 @@ proc commandCompileToC(graph: ModuleGraph) =
  let conf = graph.config
  extccomp.initVars(conf)
  semanticPasses(graph)
-  if conf.symbolFiles == disabledSf:
+  if conf.symbolFiles == disabledSf and not isDefined(graph.config, "cbackend2"):


Can be flipped to

if isDefined(graph.config, "cbackend2"): ... elif conf.symbolFiles == disabledSf: ...

as you did in line 169.

Yep, that's cleaner, thanks.

zerbina · 2023-02-22T19:09:01Z

Update: I'm currently in the process of splitting off (iterated upon versions of) the various bits developed here into separate PRs, as detailed by the plan described here.

It's likely that this PR itself is never going to be merged, but I'm leaving it open for now, since some of the discussions here are still relevant.

zerbina added 30 commits August 18, 2022 16:16

irpasses: implement array-literal -> const lifting pass

a0868ac

irpasses: progress on set-op lowering

5dfd733

irgen: implement irNull via mDefault

1bba94a

irpasses: implement mDefault transform

feea419

irdbg: print the type for type-literals

bbae099

cgen2: implement object down-conversion

6dee9e7

irgen: transform mInc/mDec

977277c

cgen2: emit bcError in the generated code

aff3717

irgen: store the discriminator symbol position instead

518a92d

cbackend2: fix importc fix being unreliable

c3e03ef

An empty body does not mean that the procedure is imported. Check for the presence of the flag instead.

cbackend2: store the IR for all procedures in a single seq

3f19907

introduce the ModuleData type

c5b95a5

Stores references to the things that make up a module, e.g. procedures, globals, etc.

cgen2: replace usages of IrStore3.owner

ae116d1

The `owner` field was only meant as a temporary measure and is going to be removed soon.

cgen2: translate more magics

03a51f7

Some basic integer operations were missing.

cgen2: fix wrong min/max implementation

c857822

The operators were swapped.

irgen: don't exclude skForVars from being globals

499f06b

cgen2: emit definitions for globals

914c37f

In addition, don't define referenced globals but declare them as `extern`.

cbackend2: register alive globals to modules

f18dfb6

irgen: fix stack-overflow in compiled code

cc9f806

The condition to whether overflow-checks should be inserted was inverted, causing `nimDivInt` to recurse infinitely.

irgen: fix inverted branch conditions

60da39c

'branch' currently means "jump if condition" not "jump if not condition"

cbackend2: register lifted RTTI globals

c3a9dfa

irpasses: move more utility procs to pass_helpers

314f59d

move RTTI related processing logic into a dedicated module

9c6f440

vmir: better support for compiler-generated magic calls (part 2)

0b0e26c

Don't use `PassEnv.magics` anymore. All compiler-inserted magic calls now have return-type information attached. Also fixes `bcOverflowCheck` having the wrong type when wrapped around `mInc|mDec`.

vmir: better support for compiler-generated magic calls (part 3)

e490f16

Reap the benefits.

cgen2: emit definitions for constants

a18f950

Their initialization logic is still missing however.

irtypes: correctly handle generic object types

db05987

It's possible for multiple `PType` instances of the same type instantiation to exist and the previous logic didn't account for it, causing type mismatch issues in the generated C code.

irgen: pass the original type for nkNilLit

3ff79aa

It's important that the unksipped types are passed to `requestType` so that imported types can be handled properly.

irgen: correctly omit compile-time-only arguments

6a73f44

The previous logic looked at the type of the *argument*, but the correct way is to look at the type of the target *parameter*. `vmgen` has the same issue.

zerbina added 7 commits September 29, 2022 21:35

cgen2: cleanup

ad33ad9

* remove unused types, procedures, and constants * improve some doc comments * update/remove/add some annotations

vmir: update comments

d4d316e

Improve some doc comments, remove stale ones, and add some notes about future directions/plans.

cbackend2: cleanup

a1b3592

* correct some typos * improve some doc comments * remove dead code * leave some annotations about future directions

cpasses: improve comments

bb77827

irpasses: cleanup

f93be5b

* correct some typos * improve some doc comments * remove dead code * remove stale annotations * leave some annotations about future directions

irtypes: cleanup

5e20453

* correct some typos * improve some doc comments * remove dead code * remove stale annotations * leave some annotations about future directions

irgen: cleanup

7dba161

* correct some typos * improve some doc comments * remove dead code * remove stale annotations * leave some annotations about future directions

haxscramper approved these changes Sep 30, 2022

View reviewed changes

saem reviewed Oct 1, 2022

View reviewed changes

zerbina marked this pull request as ready for review October 1, 2022 23:33

zerbina added 3 commits October 2, 2022 01:18

testament: enable the new back-end for the C target

b5b9cd4

Merge branch 'devel' into backend-ir

6639bee

irgen: adjust to upstream changes

8cffc58

Use the new `rvmXXX` enum value names instead of the `rsemVmXXX` ones.

saem reviewed Oct 2, 2022

View reviewed changes

Clyybber reviewed Oct 9, 2022

View reviewed changes

zerbina removed the request for review from alaviss October 14, 2022 19:28

This was referenced Oct 16, 2022

compiler: make injectdestructors a MIR pass #450

Merged

utils: introduce the idioms module #453

Merged

haxscramper added refactor Implementation refactor compiler General compiler tag compiler/backend Related to backend system of the compiler labels Nov 20, 2022

haxscramper added this to the C backend refactoring milestone Nov 21, 2022

This was referenced Feb 20, 2023

compiler: unify the backend processing #550

Closed

compiler: introduce an IR for the code generators #551

Merged

zerbina mentioned this pull request Apr 30, 2023

consider destructors of concrete types bound to concepts #678

Merged

zerbina mentioned this pull request Oct 5, 2023

rework the MIR (part 1) #942

Merged

7 tasks

zerbina mentioned this pull request Jun 5, 2024

WIP: rewrite the C code generator #1333

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: a re-implementation of the compiler backend #424

WIP: a re-implementation of the compiler backend #424

zerbina commented Aug 31, 2022 •

edited

Loading

haxscramper left a comment

haxscramper Sep 30, 2022

saem Oct 1, 2022 •

edited

Loading

saem Oct 1, 2022

saem Oct 1, 2022

saem left a comment •

edited

Loading

saem Oct 1, 2022

saem Oct 1, 2022

saem Oct 1, 2022

saem Oct 1, 2022

saem Oct 1, 2022

zerbina Oct 10, 2022

saem Oct 10, 2022

saem Oct 1, 2022

zerbina Oct 14, 2022

saem Oct 1, 2022

saem Oct 2, 2022

saem Oct 2, 2022

saem Oct 2, 2022

zerbina Oct 14, 2022

saem Oct 15, 2022

saem Oct 2, 2022

Clyybber Oct 9, 2022 •

edited

Loading

Clyybber Oct 9, 2022

zerbina Oct 10, 2022

zerbina commented Feb 22, 2023


		Currently also ignores whether or not a hook is trivial and thus replaces the assignment for types that don't actually need/use a `=copy` hook.

		#### `refcPass`

	# XXX: instead of manually figuring out out passes are to be batched
	# XXX: instead of manually figuring out how passes are to be batched

		@@ -0,0 +1,2450 @@
		## `vmir`-based C code-generator. Separated into two phases:

WIP: a re-implementation of the compiler backend #424

Are you sure you want to change the base?

WIP: a re-implementation of the compiler backend #424

Conversation

zerbina commented Aug 31, 2022 • edited Loading

Summary

Notes for Reviewers

haxscramper left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saem Oct 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saem left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Clyybber Oct 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zerbina commented Feb 22, 2023

zerbina commented Aug 31, 2022 •

edited

Loading

saem Oct 1, 2022 •

edited

Loading

saem left a comment •

edited

Loading

Clyybber Oct 9, 2022 •

edited

Loading