Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vm: store data as raw contiguous bytes #242

Merged
merged 1 commit into from
May 17, 2022
Merged

Conversation

zerbina
Copy link
Collaborator

@zerbina zerbina commented Feb 28, 2022

Summary

This replaces the PNode-based data representation in the VM with
storing the data as raw contiguous bytes, similar to how it works on
the c/cpp targets. VmType encodes the information necessary to
interpret the raw bytes.

Details

  • since PTypes store a lot of information irrelevant for the VM, they
    are first translated into a format (VmType) that's faster and easier to
    traverse/query. The mapped-to-types are de-duplicated, meaning that
    if two VM types have the same ID (currently the same instance), they're
    also the same type, making type equality checks very fast. This
    respects structural-vs-nominal types, but currently not distinct and
    named-vs-unnamed tuples.
  • location type information is stored separately from the actual data
  • const values are still stored using PNode representation, which
    requires a translation step whenever data crosses the compiler/VM
    border
  • it is enforced that locations are only accessed via handles of
    compatible type, reporting an error on violation
  • all memory access is now also ensured to not go outside of
    VM-owned memory, with an error being reported on violation

Known changes in observable behaviour

  • repr output in compile-time/NimScript contexts changed a bit
    when compiling with --mm:refc. The output is more closer to
    that with --mm:arc.
  • when passing non-NimNode values to non-static parameters with
    genAst/getAst, the invoked macro always receives it as typed AST
    now (even if the parameter is of untyped type)

The initial goal was for this PR to be the VM refactoring PR, but it shifted to being about unblocking both the DOD AST and a vmgen rewrite.

I'm not happy with the overall quality, but I think it has to suffice for the first step. The biggest issues:

  • the code in vm.nim is too complex
  • vmgen.nim was in a bad state before, but it's even worse now
  • there's lots of redundancy in both the instruction set and the implementation
  • I had to introduce lots of temporary hacks/workarounds

Lots of parts/components are in a transitional state and not finished yet. I do have a clear plan on how to further proceed from here. In the bigger VM refactoring picture, this PR marks the first big step, with further steps being:

  • use a IR for vmgen; rewrite vmgen in general
  • revise the instruction set; do more work at code-gen time as opposed to during execution
  • architectural improvements
  • restructure/refactor the execution loop
  • bring all unfinished efforts to completion (type layout, type creation, allocator, etc.)
    I'm going to move to a more incremental development approach (smaller, more frequent PRs) for the rest of the VM refactoring from this PR on.

The biggest part of the test suite runs without changes, where some had to be adjusted due to slight behavioral changes in some areas (e.g. repr, getAst). Some CPS tests also succeed, but they're running a bit slower now.

Please don't merge this yet. I'd like to first compile a test suite covering all VM/const related issues reported in the mainline repository as a separate PR, in order to see if and what issues are fixed with this PR. (Edit: It makes more sense to do this once the rest of the VM related work is done)

Notes for reviewers:

  • I performed lots of last-minute refactorings, so there might be some things like unused/unneeded function parameters that I forgot/missed to remove
  • some module/variable/function/type names are still a bit off
  • documentation is probably lacking in some areas. If you have any suggestions for improvements, please let me know
  • code style might also need some attention

To-do

  • Merge devel into PR once the dependencies are merged into devel (e.g. improved error handling)
  • Get rid of register pointers (rkRegAddr)
  • Remove vmgen related hacks where possible (postponed to after this PR)
  • Finish variant object support
  • Implement missing PNode to vm data conversions
  • Fix/finish function pointers and closures
  • Proper handling/allocation of globals/heap objects
  • Finish ref support
  • Fix remaining non-working instructions
  • Implement missing vm callbacks
  • Implement GC for ref types
  • Finish memory/location safety implementation
  • Add tests to make sure that the memory/location safety enforcement works as expected
  • Get FFI support back in shape
  • Finish/clean up remaining loose ends (postponed)
  • Don't store every integer and float type as an 8-byte value
  • Proper set implementation
  • Better VM allocator implementation (postponed)
  • Improve how small values (int, float, ptr, etc.) are allocated for registers (postponed)
  • Reorganize modules, functions, types, etc., clean up and restructure function interfaces, rename types/functions
  • Add doc comments to all types and functions
  • Code style fixes
  • Rebase onto devel

@saem
Copy link
Collaborator

saem commented Mar 1, 2022

Thank you and welcome.

Your pr description alone is great, I'm reading the code now, but I wanted to acknowledge how pleased I am with the approach you're advancing. Thanks.

(Goes to read code)

Copy link
Collaborator

@saem saem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered a few items

compiler/vm/vm.nim Outdated Show resolved Hide resolved
compiler/vm/vm.nim Outdated Show resolved Hide resolved
compiler/vm/vm.nim Outdated Show resolved Hide resolved
compiler/vm/vm.nim Outdated Show resolved Hide resolved
@saem
Copy link
Collaborator

saem commented Mar 1, 2022

After having read through the initial sketch and revisiting the PR:

The part about having the VM be separate, that might not be immediately necessary, but promoting vmgen to a more first class backend would help a lot.

Here is a possible trajectory, it can be modified of course, mostly a thought experiment to illustrate a bunch of things at once:

  • we can generate legacy vmgen code, and use that to see/test vmgen for different code and learn about pre/post conditions of today
  • we can also have a new vmgen if we want a revised instruction set, it's still good for comparative code gen (generate and think about the output)
  • if the VM is sightly more standalone/the backend is more first class, we can runs lot of our existing tests on the VM; this mean cross-leveraging the spec work
  • with gen and execution side-by-side we can compare the results of various code, differential debugging

For the code, what's the reason for using PTR instead of a ref for the stack?

Documenting the opcodes along with adding checks for pre and post conditions would be awesome. In debug mode checking for the conditions is also super useful.

@canelhasmateus
Copy link
Contributor

Very welcome! And thanks a ton for your time and effort

@zerbina
Copy link
Collaborator Author

zerbina commented Mar 1, 2022

Thank you for the comments and reviews everyone, it's much appreciated :).

For the code, what's the reason for using PTR instead of a ref for the stack?

The idea is to allow the usage of TStackFrame objects allocated on the stack or in containers (e.g. seq). With ref, each frame is required to be created via new which has lot's of downsides (cost of invoking heap allocator, possbile heap fragmentation, cache inefficiency due to objects scattered across memory, etc.). Since a stack frame only lives as long as the corresponding rawExecute it makes sense to allocate it on the stack there with automatic storage duration.

The TStackFrame.next field (probably needs a more descriptive name) stores the caller's frame. Since the caller's frame always outlives the callee's, it's safe to use a "raw" pointer here. In an ideal world I would use view types (lent and var) instead of ptr but - AFAIK - they are still in development limbo.

In general, I try to use ref as little as possible when designing types/interfaces due to their aforementioned severely restrictive and also somewhat viral nature.

I just pushed the next batch of changes. Some general notes for reviewers regarding my code comments:

  • Comments starting with "TODO" that end with a question mark are most of the time meant as questions for reviewers, but also sometimes meant as reminders for myself. Without question marks, they are just reminders.
  • Comments starting with "XXX" or "xxx" are commentary about code or ideas I had that are mostly meant as annotations for myself.

Details/Notes

There are multiple areas where I still haven't made a final decision on how to tackle them. Namely:

  • How callable objects (closures, function references) are to be handled
  • How exception objects are to be handled (Will probably marshal these into VmObjectAtom)
  • PNode handling in general
  • array/seq data representation
  • Error handling: What are "expected" errors (reports) and what should be assertions only?

Object variants are also going to be tricky and, with the current approach, will probably require vmgen patches.

Regarding PNode and related opcodes, I had the idea to remove the NimNode related opcodes (those starting with N) and replace them with callbacks, which would then be invoked via the existing opcIndCall or via a new specialized call opcode. vmgen would need to be patched for that, but it would be a lot easier to add new NimNode related operations to the VM in the future then. The overall VM code would get less complex too. A neat side-effect would be that the NimNode manipulation/querying procs from macros.nim could be called indirectly without a wrapper proc which, AFAICT, isn't possible right now. Example:

# Not possible right now
var f = newNimNode
discard f(nnkEmpty, nil)

Regardless of how I go about representing PNodes in registers, it's not entirely clear to me whether or not they should have value or reference semantics. As they were used for representing AST nodes, complex data and nim nodes in the original VM, I'm not even sure if the semantics were consistent there. Personally I'd prefer value semantics (easier to reason about the VM code then IMO) and let vmgen take care of restoring reference semantics where needed via pointer/refs.

@zerbina
Copy link
Collaborator Author

zerbina commented Mar 3, 2022

My previous assessment about the usage of PStackFrames was wrong. rawExecute executes procedure calls inline (probably for efficiency) and thus uses an indeterminate number of frames (opposed to just one). It's currently not possible to easily calculate the maximum used number of frames (call depth basically) upfront, due to the lazy evaluation of calls (procedures are only ever compiled when they are first invoked). Simply using a seq to push/pop frames to won't work with my attempted ptr mechanism, as changes to the sequences length would invalidate all pointer to it's elements.

I'd probably go for storing the frames in a seq owned by a TCtx and then store stack frame references as simple ints which represent an index into the frame seq. While the frames would still be on the heap then, compared to using PStackFrame, spatial memory locality would be better and allocator pressure would also be less (allocator only invoked when the frame sequence's capacity is exceeded).

A different approach would be to just recurse into rawExecute when evaluating a call, but that has the downsides of:

  • being a bit less efficient
  • making VM exits due to errors more complex, as the exceptional exit has to be propagated up the stack somehow (stack refers to the chain of rawExecute calls here)
  • limiting the call depth of code running in the VM to the maximum stack size of the process the VM is running in

I don't think the last downside would be an issue in practice though.

@saem
Copy link
Collaborator

saem commented Mar 5, 2022

(haven't caught up on changes, just responding to comments first)

The approach outlined with respect to a stack and an index as a reference into it, by way of a context, is how I thought this would go ultimately.

I have a sinking feeling the approach to clarifying things is through vmgen. Effectively, there needs to be a core nimskull, which is enough to either directly cover a language feature or do so via desugaring. Then each bit of core should have a clear translation into byte code, let's assume no optimizations for now. Without coverage of that we don't know the op code were missing, or are incorrectly specified, the semantics to support, etc.

PNode is not one thing and that's the issue:

  1. It can be the AST to a definition we're manipulating as part of metaprogramming
  2. It can be a compile time expression we're evaluating
  3. It can be a memory location for the VM when it wants to allocate something on its "heap"
  4. 1 and 3 work together to move things between the sem and VM, or vice versa; so shared memory

I might have missed something but clarifying the particular usage classes and associated semantics would be another way to allow incremental progress.

Perhaps an easy bit of progress is having an explicit location, handle (ability to say which location we mean), and vmmemory (seq of locations, plus a flags and other minor state). With that:

  • a VM invocation's memory can have a special location (implying handle) for the "return" of that VM execution
  • a nested VM invocation can be reasoned about as a child can leave information for a parent invocation (now it's just like a function)
  • exceptions mean we set a flag to treat the return as exception; put the exception in the return location, move any data ref'd by the exception; nuke vmmemory
  • PNode is now no longer a location plus ref, they're all genuinely AST being manipulated or evaluated (form of manipulation)

location will look pretty close to TNode, especially at first, but I see that changing as a location shouldn't really be a thing in its own there must really be a vmmemory. The decoupling will mean if it does make sense to unify then again we can do so in a more DOD fashion.

@zerbina
Copy link
Collaborator Author

zerbina commented Mar 5, 2022

Update on the current status

I'll try summarize the rough design I currently converged on, together with the reason(s) behind some decisions:

Data representation

  • VM data is represented by node-like objects named VmObjectAtom (name is a relic from my very first attempt, needs to be changed).
  • A node is either an atom (has no children) or a "branch" (has child nodes)
    • int, float, string, ordinal arrays (arrays/seqs of int/float type), arrays of objects, set and pointers/refs are atoms
    • objects and tuples are modeled via nodes and sub-nodes, effectively forming a tree

Theoretically, arrays of objects and objects could be merged into one thing (they are using the same data representation right now), but I decided to keep the distinction for now, as objects require an immutable tree layout, while an array of objects does not.

VM resources

The VM uses the following kinds of resources:

  • Registers (also referred to as slots). Most VM ops operate directly on them. Used for procedure argument passing and local variables. They either represent a value directly or hold an opaque reference to another value. The referenced (pointed to) value is either in another register, a global or a constant.
  • Globals (the "Heap"). Storage for mutable data objects used for holding dynamic user-allocated objects (via new only right now, but can be extended to alloc). Also holds global-scope variables ({.compileTime.} and top level vars in static blocks).
  • Constants. Used for passing along immutable data to the VM (type information via PNode, callables and literals)

This is mostly the same as was previously, but with a few notable differences:

  • Globals hold runtime user-allocated data now
  • Globals and constants are VmObjectAtoms instead of PNodes
  • Register are always own their value (if they contain one). No more shared ownership of values

More on registers

The difference between value and reference registers is opaquely handled when they're used as source operands. What this means, is that instructions sourcing from registers don't check for register kind, but just do a single pointer dereference. This is made possible by the following:

  • A register contains both the backing memory for a value (val) and a pointer to a value (p)
  • When register reg is a value register, p is set to the address of reg.val
  • When register reg is a reference register, p is set to the address of the address of either 1) another register's val, 2) a global or 3) a constant

Registers, globals and constants are required to have a fixed location in memory for this to work.

A downside of this approach is, that you always need to do a pointer dereference when reading from a register, which probably makes it harder for the compiler to reason about the code. An alternative would have been to a case over the register kind whenever reading from one.

I chose this design as a way to have value semantics for VM data while having a) most of the VM code emitted by vmgen still working, and b) to not copy around values everywhere. Example:

LdGlobal a, ... # Store reference to global 1 in 'a'. 'a' is a reference register
LdObj b, a, ... # Load reference to a field of the object in 'a'. 'b' is a reference register
Echo b
NewStr b # Assign a fresh new string to register b. 'b' is a value register now
Echo b

# Maybe a bit counter intuitive, but this still works. But maybe it shouldn't
LdNull a, .. # 'a' is a value register with object value
LdObj b, a, ... # Load field reference
WrObj b, ..., c # Copy value from register c into a field of a.x - this modifies the value in register 'a'

Without reference registers, each load instruction (LdXxx) would do a deep copy. The bottom example also wouldn't work anymore.

Closing thoughts

I'm still not happy with a few things:

  • While modifying registers directly is no longer possible, modifying their value can still be done with either WrObj or AddrReg+WrDeref. I'd like for instructions to only modify the value of their destination register.
    • Possible approach: Introduce new storage type called scratch space (name open for discussion) and two instructions for loading and storing to it. All mutable local variables as well as var functions arguments get stored there. Non-atom objects are also stored there instead of in registers.
  • Right now, instructions that have in/out semantics of their dest register need to implicitly transition dest to the correct state (value or reference kind, value type for value registers). vmgen (the original one too) depends on this behaviour for things like LdImmInt a, ...; ConcatStr a, b, c to work. This requires ensureKind/ensureAtomKind in lots of places, which probably has a considerable overhead and also makes the VM more complex
    • Future direction: Make this transitions explicit. Either encode transitions as part of the instruction opcode or introduce new instruction opcSetState. vmgen is easily able to keep track of register state and emit transitions where appropriate.
  • Object storage is still inefficient. The building block for data representation (VmObjectAtom):
    • is still large. "Only" 32 bytes compared to 40 of TNode but still
    • contains possibly unneeded meta-data (node kind, PType, etc.) in the case of complex objects/arrays. As an optimization, I introduced a special kind of array (currently called ordinal array) that stores int/float arrays as raw byte sequences (opposed to seq[VmObjectAtom] for normal arrays), so that array[8, uint8] is only 8 bytes in size instead of 256(!)
    • requires non-flat storage of complex objects which
      • is basically THE blocker for efficient native-like memory semantics, full openArray support, etc.
      • needs a heap allocation for every layer other than the first one (object)

Flat storage of arbitrary data types is what I'd go for further down the line.

Error reporting

Errors in the VM can be categorized into:

  • User errors. Examples: integer overflows, index errors, etc. Basically everything dependent on user-supplied runtime data. These should (must?) gracefully either exit execution with an error message or report an Exception to code running in the VM
    • todo: What about Defects? Should they:
      • always be treated as catchable? (would make vm behaviour different from non-vm behaviour if --panics:on)
      • be handled according to --panics?
    • todo: Should behaviour be consistent with std lib's (e.g. out-of-bounds access raises IndexDefect instead of catchable error)?
    • todo: What about nil access? Abort with error?
  • Static code issues. State errors (e.g. wrong register kind during read/write) that can be statically detected by analysing VM code without executing it. These would be the result of faulty code generation (problems with vmgen)
  • Dynamic state errors. Errors that can only be detected during VM execution. Overall design should be focused on keeping these low.

I use "internal error" to refer to both static VM code issues and dynamic state error from here on.

Right now I'm using assert excessively to detect internal errors, but I'm unsure on the right approach here.
Some points to think about:

  • Should internal error reporting be user-switchable via a config?
  • Should internal error reporting include the current VM stack-trace? This would help with debugging, but using the new message reporting facility currently prohibits the usage of func in places where reporting is required, due to handleReport not being noSideEffect

Further questions on direction

  • When going down the flat-object approach: What should endianess of int and float types as well sizes of int and float be? Same as host system? Same as target system system? Configurable prior to VM execution?
  • How safe should pointer and raw memory operations (by user code) be in general? Are they allowed to crash the surrounding process or worse, corrupt non-VM memory? Safe guarding against it might incur overhead
  • What about support for lifetime-tracking hooks (=copy, =sink and =destroy. Maybe even =trace)? Nim's current VM/vmgen does not support these, but would be doable.
  • What about freeing of new-ed objects? In my current implementation they are just accumulated and never freed. Should in-VM refs by tracked in an arc/orc like manner or should some different memory reclamation technique be employed?
  • Is the the code running in the VM always generated by a vmgen from the same compiler build? Right now and in the past this is/was true

Regarding testing

Ideally I'd like to do exhaustive testing of vmgen.nim and vm.nim in terms of both correctness and also code coverage. A performance testing framework for the VM would also be awesome both for testing regressions (now and in the future) and to also have some guidance when attempting improvements.

A simple yet extremely powerful performance and correctness test would be:

> nim e nim.nim -- c nim.nim # ;)

That is, to evaluate/run the compiler in the VM and build the compiler with it :). If that produces a working compiler in reasonable time, I'd say that the VM is definitely ready to run code in the wild :). A requirement for that would probably be full pointer, FFI and openArray support though

@zerbina
Copy link
Collaborator Author

zerbina commented Mar 5, 2022

@saem Ah, sorry, I only just saw your message after I hit "Comment" :). I'm a bit exhausted right now (took a long time for the write up) so I'm going to respond to your message some time later

compiler/vm/vm_enums.nim Outdated Show resolved Hide resolved
vmobjects
]

# TODO: correctly and fully supporting every type with `fromLit`/`toLit` seems
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VM needs to support core types, which need to be a minimal and performant set to represent directly or through composition all concrete types.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toLit/fromLit are not needed nor used by the VM but are a marshalling/unmarshalling convenience utility used by VmCallback's:

let atom = toLit(10)
# Instead of:
# let value = VmObjectAtom(kind: akInt, intVal: 10)

let i = fromLit(atom)
# Instead of:
# let i = atom.i

While both examples are more or less trivial for ints, you'd have to write more manual code for complex objects.

Before this PR they also only supported a few select cases with an "add as needed" comment

compiler/vm/vmdef.nim Outdated Show resolved Hide resolved
compiler/vm/vmdef.nim Outdated Show resolved Hide resolved
compiler/vm/vmhooks.nim Outdated Show resolved Hide resolved
compiler/vm/vm.nim Outdated Show resolved Hide resolved
of akString:
str.add r.atomVal.strVal
of akInt:
# TODO: original code didn't check if int is in range 0..255. Should we do that now?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend towards no.

There is a lot going on. First the byte code gen might have made a mistake. Second of this was just memory/CPU we'd have loaded the correct size into the register, as in we shouldn't have loaded 64 bits, but the 8 that are the char, then stored that. Locations are dynamic and all that.

regs[rc].intVal.int, regs[rd].intVal.int)
of opcParseFloat:
decodeBC(rkInt)
# TODO: this op has really unusual semantics. Turn it into a callback?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure yet

compiler/vm/vm.nim Outdated Show resolved Hide resolved
let gp = sym.ast[genericParamsPos]
for i in 0..<gp.len:
let idx = sym.typ.len + i
if idx < n.len:
tos.slots[idx] = setupMacroParam(n[idx], gp[i].sym.typ)
else:
# TODO: are the surrounding check and this block meant to be in the for-loop?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure they are, of everything works we never go into the else, if we do we cleanup (eval failed, hence dec) and then report an error

Copy link
Collaborator Author

@zerbina zerbina Mar 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if gp.len > 1 and sym.typ.len == n.len, the else branch is taken multiple times, always reporting the same error and also decrementing evalMacroCounter each time.

From what I can tell, evalMacroCounter is incremented only once at the start of evaluating a macro and then decremented at the end. If the else branch is taken even once, evalMacroCall would return with an evalMacroCounter less than the one it was called with which doesn't make sense, so at least the dec(g.config.evalMacroCounter) is wrong

@saem
Copy link
Collaborator

saem commented Mar 5, 2022

@zerbina you're reading my mind on a lot of things. 😅

I'll see if I can get a moment to respond to your latest comment. There is a lot there and I want to reinforce a number of things you point out to make sure they don't get lost. I want to avoid only focusing on improvement areas and not celebrating success.

For now, especially your concluding note about being able to run the compiler in the VM, yes that's absolutely the long term direction. Besides the yo dawg and inception jokes, which don't get me wrong are a killer feature, it's a big step towards meta-circular interpretation/execution. This has big impact on language optimizations, language development, interactive environments, and more.

@zerbina
Copy link
Collaborator Author

zerbina commented Mar 6, 2022

@seam Thank you for your reviews/comments/ideas. I hope my approach to developing this isn't making it too hard to review things in the current unfinished state.

When I started this PR, I did a cursory read of the code in vm.nim to get a very rough overview of the architecture, but then quickly determined that it'd be not possible for me to completely understand all possible interactions between vm.nim, vmgen.nim and sema in a reasonable amount of time. So I just made some assumptions and started hacking away at the first idea that came to my mind, with the plan to refine the new architecture as I progress.

Due to the somewhat inherent complexity of the task and me quickly trying/dropping new ideas in lots of different places, I introduced lots of code that quickly gets stale. I also wrote/write "TODO"'s and "XXX"s everywhere as reminders, so that I don't need to divert my attention but also don't forget about ideas and ostensibly strange/buggy code.

My plan is, to resolve all "XXX"/"TODO" comments and also remove stale leftover code produced by me, before I mark this PR as non-draft.

Also, I have two days worth of changes (which implement the reference register architecture along with some other things) that I should've pushed prior to my large post which probably render some of your review comments stale. Sorry for that :P

Copy link
Contributor

@alaviss alaviss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some of my initial comments skimming through the structures. I will try to look at more in the coming days.

Comment on lines 149 to 150
# XXX: is it okay to use this for unsigned integer too?
intVal*: BiggestInt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably better to just use unsigned here and do the int checks when the instruction calls for it.

Comment on lines 185 to 186
# TODO: what about type hooks (=copy, =sink, =destroy)? Original vm implementation seems to
# have completely ignored these
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have vmgen emit the locations where the hooks would be run.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by location?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=copy and =sink can be automatically emitted by the compiler (it already does that for native code backends), so I don't think we have to worry too much.

What I'm not sure of is whether the phase that gets these stuff ticking (injectdestructors) is used to transform the code for vmgen.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, the code is simply transformed with transf.transformBody by vmgen prior to code gen right now. injectdestructors is not used.

Problems would arise with ref and seq types.

ref handling is fully done by the execution engine right now, so vmgen has no clue to where and when their lifetime ends. liftdestructors seems to be responsible for the injection of ARC/ORC reference handling and maybe we can leverage this for vmgen too. There are a whole lot of codegen procs that'd need to be handled then though (nimIncRef, nimDecRefLast, nimIncRefCyclic, etc). I'm not sure if this is the right approach.

All seq related functionality is handled via dedicated instructions, meaning that the workings of seq handling are completely opaque to vmgen. A few ideas:

  • Make seqs a non-magic type (very likely a complex endeavor and one I'd not attempt as part of this PR, but could be interesting in the future)
  • Leave the seq opcodes intact but remove dedicated atom support for them from the execution engine. Use seqs_v2 for both the storage implementation and logic. Opaquely transform seq instructions to a VM procedure call - that is, run vmgen over setLen, shrink, etc. and execute them directly as guest code
  • Same as the above, but remove seq opcodes and let vmgen emit calls to seqs_v2 procs directly.
  • Nim type information needs to be carried around anyway, so VM seqs know their elements' type. Whenever elements are to be copied/moved/destroy, check for the existence of a corresponding lifetime-hook and run it as guest code for the required elements. Something similar to this could also be employed for ref type destructors

Removing dedicated seq support from the VM would make it less complex, but usage of seqs would also be slower due to each operation running as guest code.

The =trace hook can probably only be realistically supported if we make use of injectdestructors/liftdestructors somehow.

Comment on lines 155 to 161
of akOrdinalArray:
rawArrayVal*: seq[byte]
elemSize*: int8 # 1, 2, 4 or 8
oaKind*: OrdinalArrayKind # TODO: encode `oaKind` and `elemTyp` into a single int8
of akArray:
arrayElements*: seq[VmObjectAtom]
elemTyp*: PType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to merge arrays, objects and strings into one unified representation via a seq[byte]. Nim objects and friends can't have their type mutated at runtime, so there shouldn't be a need to store them in the array itself.

An idea that I have is to have a constant array designed to store type info in TCtx, then change your kind: AtomKind into something like kind: TypeId, which will be the index to that array describing the type of the atom, with the positions corresponding to AtomKind reserved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds really similar to what I'm attempting right now with the flat object idea I mentioned.

Using @saem 's proposed terminology (location, handle):

  • A seq[byte] stores a sequence of locations
  • A location is either an atom (int, float, seq, etc.) or a "container"
  • A "container" is either homogeneous (array) or heterogeneous (object) and contains either atoms or other "containers"
  • A handle is a reference to a location

To interpret the location referenced by a handle you then need the corresponding type info (TypeInfo) which is stored alongside the handle as a handle itself (typeHandle). The TypeInfo stores whether a location is an atom or a "container" and also the size in bytes of the location.
Additionally:

  • For homogeneous "containers", TypeInfo stores:
    • element count (array size is part of the type info)
    • element stride (number of bytes between two elements)
    • element type (handle to a TypeInfo)
  • For heterogeneous "containers", TypeInfo stores a sequence of field information where each stores:
    • field offset (number of bytes from the start of the "container")
    • field type (handle to a TypeInfo)
  • For seq and set atoms, TypeInfo stores the respective element types.
  • For ptr and ref atoms, TypeInfo stores the respective target types

I'm almost done with a first prototype in my local repository

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this comment deleting tons of content over and over again, so I think I'm missing some things here. This is the basis for my thinking, a quote from the manual:

code specifies a computation that acts on a memory consisting of components called locations. A variable is basically a name for a location. Each variable and location is of a certain type. The variable's type is called static type, the location's type is called dynamic type. If the static type is not the same as the dynamic type, it is a super-type or subtype of the dynamic type.

So the above sounds pretty close. For the purposes of the VM, we need to decide the level of operation we support:

  • once we're in pointer land, then the memory model needs to behave in a flat fashion, a la C, this is very baked in; can't change it, yet
  • so if we're in pointer land then arrays, seq, and objects need to have their parts stored contiguously
    • so seq[byte] works at least for that
    • we lose "free" gc; but the complexity isn't worth it

From the quote you'll notice that there is the dynamic memory model (locations) and the static memory model (handles). For the purposes of the VM. If we decide that the static/handle memory model is the floor of abstraction, then we're saying that location is a higher level concepts in VM land. This is easier to implement, but I also think we won't be able to run a bunch of key code if we go this route.

I unfortunately don't have a particularly conclusive remark right now, so I'm going to go look at the VM again and remind myself of how it uses PNodes. I suspect giving the VM its own explict "heap" (a seq[PNode]) and could have it work exclusively on those; outside of copying data into and then copying results out of for compile-time evaluation is a good firs step. The problem remains of AST manipulations for meta-programming, but that's a completely different use of PNode IMO.

compiler/vm/vmdef.nim Outdated Show resolved Hide resolved
Comment on lines 174 to 177
of akPtr:
ptrTyp*: PType
# XXX should we keep a distinction between pointer to global and poiunter to inside register?
ptrVal* : ptr VmObjectAtom
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is possible to eliminate pointers in the VM altogether? Allowing user input to manipulate global memory that is shared between the VM and the compiler is not a good idea IMO, but not having these would also complicate certain parts of the VM.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in inside the VM implementation or inside guest code? Writing the VM without pointers is possible, but - as you mentioned - would make some parts more complicated and maybe also a bit slower I'd say

@zerbina
Copy link
Collaborator Author

zerbina commented Mar 9, 2022

Next design/architecture iteration

Note: The current commit only reflects small parts of this.

I adopted @saem 's proposed terminology (location, handle) for now. In general, names are not final and open for discussion.

A value is either made of a single Atoms (the smallest building block) or a "container" (haven't found a better name for them yet). Containers are composed of Atoms and other containers. They are either heterogeneous (composed of elements of differing type/layout, i.e. objects) or homogeneous (composed of elements of the same type/layout, i.e. arrays).

Atoms and containers are stored in locations. Type and layout information of a location is seperately stored. locations are referenced/accessed via handles. To interpret the contents of a location, type/layout information is needed. A TypedHandle is a handle together with type/layout information.

Atom kinds are divided into 'managed' an 'unmanaged'. Atom kinds which need special handling regarding copying/moving/destruction (seq, string, ref, discriminators, etc.) are managed. The others are 'unmanaged'. Containers are 'unmanaged' if all their of their elements are also 'unmanaged'. Otherwise, they're 'managed'.

Memory is divided into typed and untyped. Values whose type is known at their creation time (globals, constants, stack variables and objects created via new) are stored in typed locations. Memory created via alloc is untyped. Several restrictions apply to typed locations:

  • Modifying them is only allowed via references with a type matching the location's. An exception is made for locations part of a byte array (seq[byte] is also allowed). These can be written to via references of unmanaged type.
  • Reading from them is only allowed via references with a type matching the location's. Again, an exception is made for byte arrays

Untyped locations can be written to with any value, even managed ones. The latter is also valid in standard Nim code and thus should be supported. This can introduce memory leaks though

TypedHandles are required to be valid. That is, they must reference a valid location and reading/writing through them must be allowed according to the rules above.

An Address is similar to a TypedHandle except that it's allowed to be invalid. It can't be used directly to read a location, but has to be converted into a TypedHandle first. Trying to convert an invalid Address into a TypedHandle is an enforced run-time error. Writing through an invalid Address is also a run-time error.

The existence of handles is invisible to the guest.

Registers either contain:

  • nothing
  • an int value
  • a float value
  • an Address
  • a TypedHandle (owned or weak)

If a registers owns a handle, it will free the memory it references once it transition to a different register state.

For addresses and handles, registers also keep track of whether they reference typed or untyped memory (locations).

Implementation details

Int and float registers are dynamically typed, meaning that they're interpreted based on the instruction that works with them (e.g. Addu treats it's operands as unsigned while AddInt treats them as signed).

Accessing (both reading/writing) a managed atom in untyped memory via a TypedHandle requires validity checks on the accessed atoms first. This is necessary, as the guest is allowed to freely modify untyped memory.

Addresses

An Address is set to valid if it's

  • converted to from a TypedHandle
  • read from a ptr atom that is stored in a typed location
  • taken of a global
  • taken of a seq's element
  • taken of an array's element or object's field where value in question is in a typed location

Otherwise it's marked as potentially invalid. The following nim pseudo-code describes the handling of potentially invalid addresses:

# opcLdDeref
var a = ... # The register containing the address
var b = ... # The register to hold the handle

if not a.address.isValidated:
  if not isValid(a.address): error()
  a.isValidated = true

b = asHandle(a)


# opcWrDeref
var a = ... # The register containing the address
var b = ... # The register containing the value

if not a.address.isValidated:
  if not isValid(a.address): error()
  a.isValidated = true

copyTo(a.address, b)

When a typed memory region is freed, all registers containing weak TypedHandles that reference the freed region are implicitly cleared (set to contain nothing).

When an untyped memory region is freed, all addresses referencing the freed region are invalidated. Registers containing weak TypeHandles referencing the freed region are reset to nothing.

This implicit resetting should probably be made explicit by requiring vmgen to conservatively emit register reset instructions on registers that contain weak handles

Closing

This overall design/architecture tries to be as permissive as possible while still being efficient and safe. Some parts might be a bit too restrictive, however.

If I didn't forget/omit anything, this architecture should allow for pointer arithmetic, copyMem/zeroMem and other raw memory manipulation to be completely safe. Safe as in, not possible to corrupt memory outside the guest's environment, not that the guest can't write to it's own memory in unintented places.

If we go with this design, I'd write tests to ensure that the safety measures all work.

@zerbina
Copy link
Collaborator Author

zerbina commented Mar 10, 2022

Some further thoughts and some rationale

Typed/untyped memory

The idea behind the typed/untyped distinction is/was, to allow reading/writing managed atoms to alloc'ed memory without having to do validation whenever accessing managed atoms in the VM. Since aliasing to or from managed atoms is prevented in typed memory, it's safe to treat them as always valid there. However, since array[..., byte] in typed memory is also treated as typed memory, you can't (as in, run-time error) do something like:

var a: array[sizeof(seq[int]), byte]
cast[ptr seq[int]](addr a)[].newSeq(10)

The question remains of course, if things like this should even be allowed in the first place. At least with ARC/ORC and the C/C++ backends, this would work.

In general, I'm trying to get the VM to - more or less - the same level as the C/C++ backends, feature wise.

If the byte array example above should be supported, we could - instead of treating alloced memory as untyped and everything else as typed - treat byte arrays as typed and everything else as untyped. To the VM, the region created by alloc(x) would be the same thing as an array[x, byte] then.

Supporting seq and string in untyped memory is a nuisance in general. Getting rid of it would make things less complicated and the VM a bit faster, since all seq and string atoms can be treated as valid then. refs in untyped memory should still be allowed.

On aliasing

Allowing the aliasing of byte regions with unmanaged atoms would make the VM backend inconsistent with at least the C++ backend. While allowed by Nim, var x: int32; var y = cast[ptr float32](addr x)[] would produce C++ code that, IIRC, invokes undefined behaviour due to C++'s type aliasing rules. As far as I know, you have to use memcpy for safe type punning (treating the bit-pattern of an object as an object of a different type) there. We could go down a similar route of disallowing (but also enforcing) aliasing of locations via pointer dereferencing. The code snippet above would produce a run-time error then. Reinterpreting the bit-representation of an object would only be allowed via copyMem.

If aliasing of byte regions is allowed in some capacity, there is one more pain point: A byte's meaning is conflated with that of untyped memory. I dislike this a lot. Ideally, Nim (Nimskull) would support something like array[..., untyped]. byte arrays can simply stay as a collection of bytes then.

@alaviss
Copy link
Contributor

alaviss commented Mar 12, 2022

Sorry for the slow follow-up, I got some busy weeks recently.

The PR is getting rather big which makes it hard for us to follow through with all the changes that you are planning, not to mention the integration risks of such a large change. You are basically working too fast for us to keep up 😄.

From what I can tell reading this PR, these are what being done:

  • Refactorings:
    • Reduce the amount of refs used.
      • PCtx -> TCtx: This one seems simple-ish, as most of it seems to be a simple PCtx -> var TCtx, though it could just be that I haven't fully grasp the scope of the changes.
      • PStackFrame -> seq/stack pointers/indices: This one looks like it'd touch a lot of runtime code as it fundamentally changes the representation.
    • proc -> func where appropriate
    • PNode -> VMObject: This one is the most complicated, and is the main star of the show.
    • PType decoupling: Potentially complex as it requires translation from AST to VM-specific representation. It'd be a big win for performance, though.
    • Opcode documentation.
  • Planned features:
    • Feature parity with C/C++ backend via flat object repr.

I'm sure there are more that I haven't covered in this list.

My recommendation is to plan out changes in subsets that we can review and integrate now, so that we can start getting your work merged. An ordering that I think could work well would be:

  1. De-ref PCtx: It's little risk for much gain afaict.
  2. Rework stack frame implementation: I haven't checked how well this PR works, but changing the current linked-list into a seq is great for both simplicity and performance, and if it works well we can prioritize it first.
  3. Factor PNode out: This is big and I'm currently a bit lost on how you'd do it atm.

Other stuff can be weaved in these subsets as you see fit, but I'd recommend not adding any features or plan too much ahead until we are done with refactoring. The main goal right now is to get PNode away from the VM, we can build bigger changes on top once we got the foundation in.

This ordering is only a suggestion though, please do it however you think is better, as I only have a limited grasp on how you are tackling this at the moment.

Also, for discussions on the design I'd recommend we do it in the discussion forum as it's hard to follow up on discussions in PR due to Github hiding previous conversations and that discussing more than one topic become hard to track here.

Again, thank you for the contribution, the work you are doing here is amazing :)

@zerbina
Copy link
Collaborator Author

zerbina commented Mar 26, 2022

Pushed my local changes. Rebase didn't happen yet, so the two VM related commits from devel are not part of this yet. I also updated the PR description to better reflect the current state of things and to include some pointers for reviewers.

The changes include the move to flat object representation of data so a lot of code changed. Bootstrapping works (with both refc and orc) and some tests also pass already.

To make future progress on this easier and to remove some hacks that make things more complicated/error-prone, I'm going to do a simple overhaul (no IR yet) of vmgen in a separate PR first.

Another thing that I'm going to split of and finish prior to this PR is improved error handling (this issue)


As an aside, the second and third iteration of bootstrapping take a few seconds longer now (from around 17 to around 25 seconds on my machine) and maximum memory consumption went from around 800 MB to around 1470 MB. I looked into it a bit, and it seems like injectdestructors for the rawExecute function takes a whopping 6.5 seconds(!). The old code didn't use any =destroy hooks so the pass didn't have to run.

Time and memory consumption with ORC stayed the same (injectdestructors already took place for rawExecute there)

Copy link
Collaborator

@saem saem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added some minor comments, trading a much deeper look at vmtypes. I think I had a bit of an epiphany, just need to think through and write it down.

compiler/vm/vm.nim Outdated Show resolved Hide resolved
of opcWrObj:
# TODO: we probably need something like opcWrObjMov for directly moving a value into an object.
# This wasn't necessary for the old vm due to it's abuse of PNode, but values now have
# copy semantics by default
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay!


if not (isLessEqual(l, a) and
isLessEqual(a, h)):
# TODO: are range errors `Exception`s or `Defect`s in normal Nim? If they
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defects are almost never the answer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead of aborting execution here, a guest catchable exception should be raised? The current behaviour (aborting) is inherited from the old code and is also what happens for the other back-ends

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we can preserve the behaviour, I think it depends on context.

It's so very rare that one has a true defect in that it's irrecoverable, so unless the intention is to never use this op code as a predicate then defect is fine for now.

if bNode.kind == nkAccQuoted:
bNode = bNode[0]

func asCString(a: TFullReg): cstring =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be done now, but this feels like callback ("syscall") territory.

compiler/vm/vm_enums.nim Outdated Show resolved Hide resolved
@@ -32,6 +41,8 @@ type
opcLdObj, # a = b.c
opcLdObjAddr, # a = addr(b.c)
opcWrObj, # a.b = c
# TODO: needs a better name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about opcWrObjFld?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have put a newline there; the to-do is targeted at opcLdRef.

opcLdRef loads a handle to the location which is referenced/targeted by the ref value in register 'b' into register 'a'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's creating a new reference/handle to the same value/location.

I'm thinking out loud below:

From a very low level generic register machine perspective we're:

  • copying a value from B to A
  • updating/storing a value to track the new ref

From the computational model perspective we're creating a new handle to a location and tracking the new handle for a various reasons one of which is knowing if a location is orphaned or not.

At present many of the operations, at least in terms of vocabulary use generic VM terms. That terminology hasn't been updated and it might not need to. At least at this juncture I'm not sure which to use, because opcCloneHdl feels like the most natural thing but it's at odds with the other stuff.

Helpful line of inquiry?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not entirely sure if opcCloneHdl would be correct/fit.

What opcLdRef currently does (in Nim code) is:

# opcLdRef x, r
let r: ref int = ...
var x: lent int = r[]

So no copying/cloning takes place and no location is actually modified

Copy link
Collaborator

@saem saem Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud:

A handle is synonymous with naming a chunk of memory (location) and assuming its schema (type). Making a handle for a location and shoving it into a register is equivalent to putting an address in a register.

I reread the operation and it seems like that's what's happen, but I'm on the phone and I couldn't trace through all the relevant code.


I noticed a potential difference in our conceptual models, I consider a location a register... I'm less certain this is a good idea. Perhaps that's why I'm perceiving move/copy but you're not as I don't think you consider the register a location. Moreover, I think the perspective I was taking is not entirely correct. Although, it does have partial utility in that we can describe the fact that pointer mutations don't have to be visible to registers because in my model registers hold values.

I'm still not sure what to call this.

compiler/vm/vmmarshal.nim Outdated Show resolved Hide resolved
compiler/vm/vmtypes.nim Outdated Show resolved Hide resolved
vmobjects
]

# XXX: correctly and fully supporting every type with `fromLit`/`toLit` seems
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what would potentially become easier with the field generalization we chatted about.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how? The complexity/work is due to having to write a whole lot of when T is xxx/elif blocks in order to support all types representable by the VM (e.g. array, seq, int, int32, int8, etc.).

Generalized fields wouldn't help there, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that I imagine the number of types that needs to be supported by the VM would decrease. Potentially we could collapse array and seq. Set might disappear too.

Although I didn't get a chance to check the latest list of types and it might already be shorter. In which case you're absolutely correct.

Copy link
Contributor

@alaviss alaviss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking so long to review, I was busy in the last few weeks.

This is the first pass of the review, where I focus on VM static memory.

Comment on lines 175 to 176
elementStride*: int ## The amount of bytes between the start of two consecutive
## elements. Can be larger than the size of an element alone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the total size stored in sizeInBytes and elementCount, you should be able to compute this value without actually storing it, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it's just there to not having to compute the stride with the element's sizeInBytes and alignment whenever accessing an array

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer compute over storage if its trivial

compiler/vm/vmdef.nim Outdated Show resolved Hide resolved
Comment on lines 134 to 149
BranchListEntryKind* = enum
blekStart
blekBranch
blekEnd

BranchListEntry* = object
case kind*: BranchListEntryKind
of blekStart:
field*: FieldPosition
numItems*: uint32
numBranches*: uint16
of blekBranch:
fieldRange*: Slice[FieldPosition]
of blekEnd:
parent*: uint32
nextField*: FieldPosition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is your object variant implementation? Can you briefly explain how it works?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Admittedly, it's a bit complex.

Essentially, the branches field in VmType stores a linear list of BranchListEntrys. The entries describe the tree a variant object might form (via nested cases) in way that allows for traversal in a mostly forward fashion, requiring no recursion or look-aside buffers (implemented in the variantFields inline iterator). Nodes (discriminators, denoted by blekStart entires) are stored in depth-first manner.

elementType*: PVmType

sizeInBytes*: uint
alignment*: uint8 ## 1 shl `alignment` == real alignment value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why you are storing the alignment. If all offsets and sizes are pre-computed, then this value would be rather useless during execution, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the alignment field is possible, but you'd have to recompute the alignment for fields of object type every time (which would likely be acceptable) and elementStride/seqElemStride couldn't be removed then.

Of course, sizeInBytes could simply include the padding required to make the size a multiple of the alignment, but I'm a bit reluctant to do that and the allocator probably also needs to be aware of types' alignment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say to keep unneeded data out until you really need it. Right now the only time the size without padding is ever needed is when the user puts {.packed.} on the structure, thus, you should factor the alignment in.

Comment on lines 201 to 202
of ckTemplate:
templateSym*: PSym
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you have to handle templates, sem expands all of them at codegen and you can't store them in a variable either, but I might be wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also a bit unsure of why this needs to exist. I haven't looked into it further yet, but the bottom else case of the opcIndCall instruction uses evalTemplate (the branch is hit, so it's not dead-code) and the PSym of procs used there are of kind skTemplate (which I derived the naming from)

Comment on lines 343 to 345
lut*: Table[ItemId, PVmType]

types*: seq[PVmType]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document these two fields

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Documentation is for the most part still missing. I'm planning to do that once everything is in place, since a lot of smaller things might still change and I don't want to document things that are getting thrown away in the end (inside the bounds of this PR)

Comment on lines 369 to 370
# XXX: constants need to have fixed location in memory. `ref` is used to accomplish that until I
# decide on a more efficient memory management strategy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about holding them in TGlobals?

# decide on a more efficient memory management strategy
constants*: seq[ref TypedVmValue] # constant data
typeInfoCache*: TypeInfoCache ## Manages the types used the VM
types*: seq[tuple[nType: PType, layoutDesc: PVmType]] # some instructions reference types (e.g. 'except')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this field do, exactly? Isn't typeInfoCache supposed to be your one-stop-shop for all types?

Copy link
Collaborator Author

@zerbina zerbina Apr 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typeInfoCache manages mappings between PType -> PVmType and also stores all created PVmTypes.

The types field is a lookup list for instructions that require type information (e.g. opcNew)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh... how is it any different from typeInfoCache if it's also a lookup list?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

types provides range[0..regBxMax] -> (PType, PVmType) mappings while typesInfoCache provides PType -> PVmType mappings.

Maybe it's easier if you describe how you think it should be done and I then tell you if it would work or, if not, why it wouldn't?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so that's why. I was wondering why you have two storage for basically the same stuff.

I guess we can use only one storage once you figured the way to get rid of PVmType in favor of a central index-based storage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I do the ref-based to index-based type management change as part of this PR? It's not too hard, but would require another larger scale refactoring.

Comment on lines 415 to 416
static:
doAssert sizeof(VmObjectAtom.atom) == sizeof(Atom)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put atom: Atom in there if you really need it to be like this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I did in the beginning, but it caused issues due to Atom being a {.union.} that contains GC'ed types.

VmObjectAtom is queued for removal anyway and Atom will likely follow.

compiler/vm/vm.nim Outdated Show resolved Hide resolved
Copy link
Contributor

@alaviss alaviss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the second pass.

More reviews coming soon™

compiler/vm/vm.nim Outdated Show resolved Hide resolved
compiler/vm/vm.nim Outdated Show resolved Hide resolved
compiler/vm/vm.nim Outdated Show resolved Hide resolved
func ensureAtomKind(n: var TFullReg, typ: PVmType, mm: var VmMemoryManager) {.inline.} =
## If necessary, transition register to a value register of kind `k`

assert typ.kind notin {akInt, akFloat}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preconditions should be enforced as an internal VM error. Assertions are turned off by default in release compiler builds, so they might not be protecting your code.

Comment on lines 104 to 106
# XXX: I don't think exception unmarshaling/marhshaling is really needed. And even if it is,
# use `unmarshalGeneric` instead
func unmarshalException(e: TypedHandle, nimType: PType): PNode =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small tip, if you are planning to replace something, attach {.deprecated.} to it so you get all the call sites reported, which should simplify your work.

Comment on lines 440 to 442
of rkInt: x = TFullReg(kind: rkInt, intVal: y.intVal)
of rkFloat: x = TFullReg(kind: rkFloat, floatVal: y.floatVal)
of rkAddress: x = TFullReg(kind: rkAddress, addrVal: y.addrVal, addrTyp: y.addrTyp, isValid: y.isValid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be condensed down to:

x = y

As objects are copied by default.

Specialize only when you need to adjust the resulting value, like what you do with rkHandle/rkValue for example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code you're looking at, no. Copying is unavailable for TFullReg due to the copy hook being disabled with {.error.}. This is mostly meant as way to force code authors to do the copy manually in order to ensure that it's really what is wanted and to prevent accidental copies which could result in bugs.

Unfortunately, this triggered a code-gen issue in the csources compiler which resulted in a very-hard-to-track-down memory corruption issue (x is not reset prior to moving into it), so I'm going to remove the restriction (otherwise, I'd have to disable the =sink hook which would then require manual field assignment), which will make x = y possible

compiler/vm/vm.nim Outdated Show resolved Hide resolved
assert h.typ.kind == akInt
deref(h).intVal = x.intVal
of rkFloat:
assert h.typ.kind == akFloat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend avoiding assertions where possible.

Instead, put absolute trust in the codegen, and debug troublesome places where things acts weird.

Assertions in VM are unfortunately useless, as you have no trace of the instruction stream that leads to the erroneous state.

Also, accessing wrong discriminator field is checked (in debug builds), so there is much less value in manually checking too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rapidly changed large amounts of code multiple times and assertions helped me to make sure that preconditions we're not violated after changes (which could also hint at obsolete code) and to catch bugs early. I'm also using them as tool for documentation.

In the error handling overhaul that I'm planning to do in separate PR prior to merging this one, I'd have replaced assertions in the VM with something like vmAssert, which, in debug builds, would throw an error which would be caught by a try/except around rawExecute which would then print a guest-code stack-trace and either do a globalReport or turn the error into an nkError.

In the code you commented on, the assertion is meant for the expression on the left, where no variant field access happens.

If okay, I'd keep the assertion till I finish this PR, but would remove/replace them prior to the merge, if wanted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you find them useful for debugging, please keep them. I'd prefer a design where we have as little assertions as humanly possible, but that can wait until things cleared up.

compiler/vm/vm.nim Outdated Show resolved Hide resolved
compiler/vm/vmdef.nim Outdated Show resolved Hide resolved
@zerbina zerbina changed the title Refactor VM execution engine code vm: use flat-memory representation for all data May 11, 2022
@zerbina zerbina marked this pull request as ready for review May 11, 2022 20:45
Copy link
Collaborator

@saem saem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial read, will look again in the weekend.

If tests in the todo are added and all tests are passing sooner than that I'm good to merge. Can fix in post.

compiler/vm/vmchecks.nim Outdated Show resolved Hide resolved
@zerbina zerbina changed the title vm: use flat-memory representation for all data vm: store data as raw contiguous bytes May 12, 2022
@zerbina
Copy link
Collaborator Author

zerbina commented May 13, 2022

Okay, ORC successfully bootstraps again. There were two compiler bugs that I had to work around: one code-gen issue (already working on fixing it) and one that's seemingly related to injectdestructors.

@zerbina zerbina force-pushed the vm-data-repr branch 2 times, most recently from 48b3c83 to 259fa34 Compare May 16, 2022 20:42
compiler/vm/vmtypegen.nim Outdated Show resolved Hide resolved
Copy link
Collaborator

@saem saem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bors d+

@bors
Copy link
Contributor

bors bot commented May 16, 2022

✌️ zerbina can now approve this pull request. To approve and merge a pull request, simply reply with bors r+. More detailed instructions are available here.

@zerbina
Copy link
Collaborator Author

zerbina commented May 16, 2022

@alaviss @haxscramper Do you still want to review this PR?

@zerbina
Copy link
Collaborator Author

zerbina commented May 17, 2022

The tests for the access validation checks are added. They're manually written and currently don't cover all possible interactions, but this can always be changed later on.

@saem
Copy link
Collaborator

saem commented May 17, 2022

I say it's merge time 🥳

@saem
Copy link
Collaborator

saem commented May 17, 2022

I know not everyone might have had chance to do a "final" review. The fact of the matter is lots of improvements came out of this and were already merged. 🎉♥️

After the fact we'll run into bugs and that's ok, but there were bugs before this change, and any change even after more review will have bugs.

Thankfully it's software, we can change it and from my reading it seems easier to maintain now than before.

@saem
Copy link
Collaborator

saem commented May 17, 2022

Bors r+

bors bot added a commit that referenced this pull request May 17, 2022
242: vm: store data as raw contiguous bytes r=saem a=zerbina

## Summary

This replaces the `PNode`-based data representation in the VM with
storing the data as raw contiguous bytes, similar to how it works on
the c/cpp targets. `VmType` encodes the information necessary to
interpret the raw bytes.

## Details

- since `PType`s store a lot of information irrelevant for the VM, they
 are first translated into a format (`VmType`) that's faster and easier to
 traverse/query. The mapped-to-types are de-duplicated, meaning that
 if two VM types have the same ID (currently the same instance), they're
 also the same type, making type equality checks very fast. This
 respects structural-vs-nominal types, but currently not `distinct` and
 named-vs-unnamed tuples.
- location type information is stored separately from the actual data
- `const` values are still stored using `PNode` representation, which
 requires a translation step whenever data crosses the compiler/VM
 border
- it is enforced that locations are only accessed via handles of
 compatible type, reporting an error on violation
- all memory access is now also ensured to not go outside of
 VM-owned memory, with an error being reported on violation

### Known changes in observable behaviour

- `repr` output in compile-time/NimScript contexts changed a bit
 when compiling with `--mm:refc`. The output is more closer to
 that with `--mm:arc`.
- when passing non-`NimNode` values to non-`static` parameters with
 `genAst`/`getAst`, the invoked macro always receives it as typed AST
 now (even if the parameter is of `untyped` type)



Co-authored-by: zerbina <[email protected]>
@bors
Copy link
Contributor

bors bot commented May 17, 2022

Build failed:

@zerbina
Copy link
Collaborator Author

zerbina commented May 17, 2022

Okay, the failing test should work now. The issue was with assigning a temporary to a let variable, that sometimes didn't surface due to how the allocator currently works. As a temporary solution (until vmgen gets an overhaul), assignments to a let now always perform a copy.

bors r+

bors bot added a commit that referenced this pull request May 17, 2022
242: vm: store data as raw contiguous bytes r=zerbina a=zerbina

## Summary

This replaces the `PNode`-based data representation in the VM with
storing the data as raw contiguous bytes, similar to how it works on
the c/cpp targets. `VmType` encodes the information necessary to
interpret the raw bytes.

## Details

- since `PType`s store a lot of information irrelevant for the VM, they
 are first translated into a format (`VmType`) that's faster and easier to
 traverse/query. The mapped-to-types are de-duplicated, meaning that
 if two VM types have the same ID (currently the same instance), they're
 also the same type, making type equality checks very fast. This
 respects structural-vs-nominal types, but currently not `distinct` and
 named-vs-unnamed tuples.
- location type information is stored separately from the actual data
- `const` values are still stored using `PNode` representation, which
 requires a translation step whenever data crosses the compiler/VM
 border
- it is enforced that locations are only accessed via handles of
 compatible type, reporting an error on violation
- all memory access is now also ensured to not go outside of
 VM-owned memory, with an error being reported on violation

### Known changes in observable behaviour

- `repr` output in compile-time/NimScript contexts changed a bit
 when compiling with `--mm:refc`. The output is more closer to
 that with `--mm:arc`.
- when passing non-`NimNode` values to non-`static` parameters with
 `genAst`/`getAst`, the invoked macro always receives it as typed AST
 now (even if the parameter is of `untyped` type)



Co-authored-by: zerbina <[email protected]>
@bors
Copy link
Contributor

bors bot commented May 17, 2022

Canceled.

@zerbina
Copy link
Collaborator Author

zerbina commented May 17, 2022

bors r+

*Summary*

This replaces the PNode-based data representation in the VM with
storing the data as raw contiguous bytes, similar to how it works on
the c/cpp targets. `VmType` encodes the information necessary to
interpret the raw bytes.

*Details*

- since `PType`s store a lot of information irrelevant for the VM, they
 are first translated into a format (`VmType`) that's faster and easier to
 traverse/query. The mapped-to-types are deduplicated, meaning that
 if two VM types have the same ID (currently the same instance), they're
 also the same type, making type equality checks very fast. This
 respects structural-vs-nominal types, but currently not `distinct` and
 named-vs-unnamed tuples.
- location type information is stored separately from the actual data
- `const` values are still stored using `PNode` representation, which
 requires a translation step whenever data crosses the compiler/VM
 border
- it is enforced that locations are only accessed via handles of
 compatible type, reporting an error on violation
- all memory access is now also ensured to not go outside of
 VM-owned memory, with an error being reported on violation

* Known changes in observable behaviour *

- `repr` output in compile-time/NimScript contexts changed a bit
 when compiling with `--mm:refc`. The output is more closer to
 that with `--mm:arc`.
- when passing non-`NimNode` values to non-`static` parameters with
 `genAst`/`getAst`, the invoked macro always recieves it as typed AST
 now (even if the parameter is of `untyped` type)
@bors
Copy link
Contributor

bors bot commented May 17, 2022

Build succeeded:

@bors bors bot merged commit 3282714 into nim-works:devel May 17, 2022
@zerbina zerbina deleted the vm-data-repr branch May 20, 2022 21:17
@haxscramper haxscramper added this to the VM backend refactoring milestone Nov 21, 2022
zerbina added a commit to zerbina/nimskull that referenced this pull request Oct 16, 2023
Summary
=======

Remove the obsolete `fixupTypeAfterEval` procedure and all its usages.

Details
=======

`fixupTypeAfterEval` was originally required for ensuring the value AST
returned by the VM is properly typed, but with the introduction of
`vmcompilerserdes` (nim-works#242),
this became obsolete.

Remaining usages
----------------

Normal procedures procedures cannot return `untyped`, `typed`, or
`typeDesc` values, meaning that the `fixupTypeAfterEval` in
`evalAtCompileTime` was always a no-op.

For `static` expression, `typeDesc` values are possible, but since the
evaluated result coming out of `evalStaticExpr` is always a `nkType`
AST, for which `semExpr` is a no-op, `fixupTypeAfterEval` was also a
no-op there.
github-merge-queue bot pushed a commit that referenced this pull request Oct 17, 2023
## Summary

Remove the obsolete `fixupTypeAfterEval` procedure and all its usages.

## Details

`fixupTypeAfterEval` was originally required for ensuring the value AST
returned by the VM is properly typed, but with the introduction of
`vmcompilerserdes` (#242),
this became obsolete.

### Removed usages

Normal procedures cannot return `untyped`, `typed`, or `typeDesc`
values, meaning that the  `fixupTypeAfterEval`  in  `evalAtCompileTime` 
was
always a no-op.

For `static` expression, `typeDesc` values are possible, but since the
evaluated result coming out of `evalStaticExpr` for those is always a
`nkType` AST, for which `semExpr` is a no-op, `fixupTypeAfterEval` was
also a no-op there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants