Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backtrace support, part 1: Use StackChain to represent chains of stacks #98

Merged
merged 7 commits into from
Feb 14, 2024

Conversation

frank-emrich
Copy link

This is the first PR in a series to add support for generating backtraces in the presence of stack switching.

Currently, the VMContext contains a (possibly null) pointer to the currently running continuation. Every ContinuationObject in turn contains a (possibly null) pointer to its parent. This effectively yields a linked list of continuations.

This PR changes this situation in two ways. See the comment on wasmtime_continuations::StackChain for more details.

  1. Instead of a (possibly null) pointer to a ContinuationObject), both VMContext and ContinuationObject now contain an object of type StackChain to represent their "parent". This is still effectively a linked list of stacks.
  2. However, by using the new StackChain type, it is now possible to have an explicit representation of the main stack at the end of the list of active continuations. In other words, a StackChain linked list now ends with a MainStack variant.

In addition, we now associate a StackLimits object with each element in the StackChain (including the main stack). This will be used in subsequent PRs to store a subset of the values in VMRuntimeLimits for each stack.

@frank-emrich frank-emrich changed the title Backtrace support, part 1: Use StackChain to represent chains of stacks. Backtrace support, part 1: Use StackChain to represent chains of stacks Feb 12, 2024
Copy link
Member

@dhil dhil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, mostly. See my comments.

crates/continuations/src/lib.rs Outdated Show resolved Hide resolved
crates/continuations/src/lib.rs Outdated Show resolved Hide resolved
crates/continuations/src/lib.rs Outdated Show resolved Hide resolved
crates/environ/src/vmoffsets.rs Show resolved Hide resolved
@@ -92,7 +184,9 @@ impl From<State> for i32 {
/// TODO
#[repr(C)]
pub struct ContinuationObject {
pub parent: *mut ContinuationObject,
pub limits: StackLimits,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should optimise this such that it is only present in debug builds.

Suggested change
pub limits: StackLimits,
pub limits: StackLimits, // TODO(dhil): Omit from release builds.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's a good idea. It would mean that in release mode, you can never have any backtraces at all anymore. That's not in line with what wasmtime currently does. Instead, we could try to respect wasmtime's configuration options regarding back traces, but that happens at runtime.
We may however have a special compile time flag to disable the backtrace mechanism completely, since the current design will have some overhead at every stack switch, which will improve in the future once we move away from the libcall-centric apprpoach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with a compile-time flag initially for experiments.

Copy link
Author

@frank-emrich frank-emrich Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that we will be using the same StackLimits machinery for preserving and restoring stack limits to enable stack overflow checks. Thefroe, this new flag may or may not disable stack limit checks as well. I suggest deferring the addition of this new flag until we have restored some degree of stack limit checks, some of which subsequently happens in this series of PRs.

(sorry, I should have mentioned this wrinkle in my previous comment about adding a compile time flag)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can have a "disable_backtraces" compile time flag which omits the reads/writes, and print no backtrace, for benchmarking.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do that as a follow-up PR

crates/runtime/src/continuation.rs Show resolved Hide resolved
crates/runtime/src/continuation.rs Show resolved Hide resolved
crates/runtime/src/continuation.rs Outdated Show resolved Hide resolved
crates/runtime/src/continuation.rs Show resolved Hide resolved
crates/runtime/src/continuation.rs Show resolved Hide resolved
@frank-emrich frank-emrich merged commit ceb9967 into main Feb 14, 2024
21 checks passed
frank-emrich added a commit that referenced this pull request Feb 15, 2024
As of PR #98, each `ContinuationObject` contains an object of type
`StackLimits`. In addition, there exists such an object for the main
stack, it is stored in the `VMContext` and the `StackChain::MainStack`
variant at the list of currently active continuations points to it.
These `StackLimits` objects contain counterparts of the stack-related
fields in `VMRuntimeLimits`, namely `stack_limit` and the various
`last_wasm_*` fields.

As of PR #98, the actual contents of these `StackLimits` are unused (and
not updated). This changes in this PR:

While the `VMRuntimeLimits` continue to contain the stack-related
information for the currently executing stack (either the main stack or
a continuation), we ensure that for stacks that are *not* currently
running, their corresponding `StackLimits` object now contains accurate
values about their stack limits. The doc comment on
`wasmtime_continuations::StackChain` describes the exact invariants that
we maintain.

To ensure that these invariants hold, we need to copy some fields
between the `VMRuntimeLimits` and `StackLimits` objects when stack
switching occurs. In particular, the `tc_resume` libcall now takes an
additional argument: A pointer to the `StackLimits` object of the
*parent* of the continuation that is about to be resume. Note that this
needs to happen in the libcall itself, in order to obtain accurate
values for the `last_wasm_exit_*` values in the `VMRuntimeLimits`.
@frank-emrich frank-emrich deleted the backtrace-infra1 branch March 12, 2024 18:21
frank-emrich added a commit that referenced this pull request Mar 21, 2024
…ns (#136)

Currently, we can overflow the stack while running inside a
continuation, without the runtime having any way of detecting this.
This PR partially rectifies this, by making the existing stack limit
checks that get emitted by cranelift in every wasm function prelude work
correctly while running inside a continuation.

All that was required to enable the stack limit checks was the
following:
1. Stop zero-ing out the `stack_limit` value in `VMRuntimeLimits`
whenever we `resume` a continuation.
2. When creating a continuation, set a reasonable value for the
`stack_limits` value in its `StackLimits` object.

Note that all the required infrastructure to make sure that whenever we
switch stacks, we save and restore the `stack_limits` value inside
`VMRuntimeLimits` and the `StackLimits` object of the involved stacks
was already implemented in #98 and #99. In this sense, enabling these
checks is "free": The limits were already checked, but previously using
a limit of 0.

The only remaining question is what the "reasonable value" for the stack
limits value mentioned above is. As discussed in #122, the stack limit
checks that cranelift emits in function preludes are rather limited, and
these limitations are reflected in the checks that this PR provides:
When entering a wasm function, they check that the current stack pointer
is larger than the `stack_limit` value in `VMRuntimeLimits`. They do not
take into account how much stack space the function itself will occupy.
No stack limit checks are performed when calling a host function.

Thus, this PR defines a config option `wasmfx_red_zone_size`. The idea
is that we define the stack limit as `bottom_of_fiber_stack` +
`wasmfx_red_zone_size`. Thus, the stack checks boil down to the
following:
Whenever we enter a wasm function while inside a continuation, we ensure
that there are at least `wasmfx_red_zone_size` bytes of stack space
left.

I've set the default value for `wasmfx_red_zone_size` to 32k. To get a
rough idea for a sensible value, I determined that a call to the
`fd_write` WASI function occupies ~21k of stack space, and generously
rounded this up to 32k.

**Important**: This means that these stack limit checks are incomplete:
Calling a wasm or host function that occupies more than
`wasmfx_red_zone_size` of stack space may still result in an undetected
stack overflow!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants