Consider offering an API similar to copyless #10

TethysSvensson · 2019-03-20T22:07:47Z

The copyless crate allows one to allocate a large struct on the heap without first creating it on the stack and then copying it. One should be able to do this with bumpalo too.

If you try to run this code, it will cause a stack overflow exception (even though we want the data on the heap):

fn main() {
    let bump = bumpalo::Bump::new();
    bump.alloc([0; 1_000_000_000]);
}

The copyless crate solves this by using on the fact that llvm seems to be able to optimize x = FOO; memcpy(ptr, &x, sizeof(x)); into *ptr = FOO.

On the other hand, llvm does not seem to be able to optimize x = FOO; ptr = alloc(sizeof(x)); memcpy(ptr, &x, sizeof(x)); in the same way, and instead relies on the stack to store the value of FOO.

The text was updated successfully, but these errors were encountered:

fitzgen · 2019-03-21T22:31:04Z

Don't have the time at this very moment, but it certainly seems worth considering!

TethysSvensson · 2019-03-22T07:13:07Z

If I were to write a PR with such an API, would you be interested in reviewing and maintaining it as part of the bumpalo crate?

fitzgen · 2019-03-22T17:04:18Z

Probably?

My concerns are basically:

Can we verify that the API is avoiding the memcpys, not just now, but continues to do so in the future as we merge PRs and bug fixes, etc.
How large will this API be? Both in terms of stuff that crate users will have to learn/consider and in terms of maintenance. If it is essentially one new type and method that is a wrapper over our existing stuff, with a little unsafe and uninitialized memory, then that seems reasonable. If it is pretty invasive, maybe not...

Those are the things that I would look for when investigating copyless and its API. Maybe you can help do some of that investigation before jumping straight to a PR?

TethysSvensson · 2019-03-22T22:56:04Z

The API offered by copyless is not guaranteed to always avoid memcpys. It is a best-effort solution that seems to work well on release mode, but does not work on debug mode. I do not think it is possible to offer such a guarantee in current rust (at least until we get something similar to placement new).
It is indeed not very invasive.

I've played around with it for a bit using three tests on release: allocating a large array, a large struct and a large enum. All of these have been defined to be too large to keep on the stack without causing a panic because of stack overflow.

I have found many good solutions that keep most of these scenarios off the stack, thus avoiding the panic. However I have not yet been able to find a solution that passes all three tests.

Would you still be interested in an API that is not guaranteed -- and indeed have known failure modes?

TethysSvensson · 2019-03-23T09:05:56Z

By tweaking the code a bit, I managed to find a version that passes all of my tests. I would consider it good enough for my own uses.

I am still of the opinion that it is impossible to guarantee the optimization without language support such as something similar to placement new . If we want an API that is guaranteed to work, we should return a *mut T and make the user write to it manually -- and even this is not without pitfalls.

My implementation adds a single new method alloc_with. This method takes closure instead of a value, but otherwise it is mostly identical to the old alloc. The only real difference is that the ptr::write and closure invocation has been move into a separate function to make the optimization as consistent as possible. I've then re-written alloc to simple delegate to alloc_with.

If you are still interested let me know, and I will prepare it for a pull request.

fitzgen · 2019-03-25T16:57:18Z

I am still of the opinion that it is impossible to guarantee the optimization without language support such as something similar to placement new . If we want an API that is guaranteed to work, we should return a *mut T and make the user write to it manually -- and even this is not without pitfalls.

Agreed, that it can't be guaranteed without changes to the language. I was hoping that we could do a best-effort solution where we run memcpy-find or grep through llvm-ir or something. This way we could at least verify that we are on the happy path now, and check whether future PRs take us off the happy path. Perhaps the copyless crate has some infra we can reuse for this?

My implementation adds a single new method alloc_with. This method takes closure instead of a value, but otherwise it is mostly identical to the old alloc. The only real difference is that the ptr::write and closure invocation has been move into a separate function to make the optimization as consistent as possible. I've then re-written alloc to simple delegate to alloc_with.

This sounds great, exactly the kind of light maintenance burden I was hoping for.

If you are still interested let me know, and I will prepare it for a pull request.

With some way to (at least manually) verify whether a change takes us off LLVM's happy path or not (either a test or a script I could run manually to check) I would love to merge this functionality.

TethysSvensson · 2019-03-25T21:30:39Z

Agreed, that it can't be guaranteed without changes to the language. I was hoping that we could do a best-effort solution where we run memcpy-find or grep through llvm-ir or something. This way we could at least verify that we are on the happy path now, and check whether future PRs take us off the happy path. Perhaps the copyless crate has some infra we can reuse for this?

My current implementation has a few tests that causes fatal runtime error: stack overflow unless LLVM manages to optimize it correctly. They have been disabled when debug_assertions are set (as I don't know of another way of checking if it is a release build).

With some way to (at least manually) verify whether a change takes us off LLVM's happy path or not (either a test or a script I could run manually to check) I would love to merge this functionality.

I'll prepare the PR then.

TethysSvensson mentioned this issue Mar 25, 2019

Add an alloc_with function #11

Merged

fitzgen closed this as completed in #11 Mar 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider offering an API similar to copyless #10

Consider offering an API similar to copyless #10

TethysSvensson commented Mar 20, 2019

fitzgen commented Mar 21, 2019

TethysSvensson commented Mar 22, 2019

fitzgen commented Mar 22, 2019

TethysSvensson commented Mar 22, 2019

TethysSvensson commented Mar 23, 2019

fitzgen commented Mar 25, 2019

TethysSvensson commented Mar 25, 2019

Consider offering an API similar to copyless #10

Consider offering an API similar to copyless #10

Comments

TethysSvensson commented Mar 20, 2019

fitzgen commented Mar 21, 2019

TethysSvensson commented Mar 22, 2019

fitzgen commented Mar 22, 2019

TethysSvensson commented Mar 22, 2019

TethysSvensson commented Mar 23, 2019

fitzgen commented Mar 25, 2019

TethysSvensson commented Mar 25, 2019