Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider offering an API similar to copyless #10

Closed
TethysSvensson opened this issue Mar 20, 2019 · 7 comments
Closed

Consider offering an API similar to copyless #10

TethysSvensson opened this issue Mar 20, 2019 · 7 comments

Comments

@TethysSvensson
Copy link
Contributor

The copyless crate allows one to allocate a large struct on the heap without first creating it on the stack and then copying it. One should be able to do this with bumpalo too.

If you try to run this code, it will cause a stack overflow exception (even though we want the data on the heap):

fn main() {
    let bump = bumpalo::Bump::new();
    bump.alloc([0; 1_000_000_000]);
}

The copyless crate solves this by using on the fact that llvm seems to be able to optimize x = FOO; memcpy(ptr, &x, sizeof(x)); into *ptr = FOO.

On the other hand, llvm does not seem to be able to optimize x = FOO; ptr = alloc(sizeof(x)); memcpy(ptr, &x, sizeof(x)); in the same way, and instead relies on the stack to store the value of FOO.

@fitzgen
Copy link
Owner

fitzgen commented Mar 21, 2019

Don't have the time at this very moment, but it certainly seems worth considering!

@TethysSvensson
Copy link
Contributor Author

If I were to write a PR with such an API, would you be interested in reviewing and maintaining it as part of the bumpalo crate?

@fitzgen
Copy link
Owner

fitzgen commented Mar 22, 2019

Probably?

My concerns are basically:

  • Can we verify that the API is avoiding the memcpys, not just now, but continues to do so in the future as we merge PRs and bug fixes, etc.

  • How large will this API be? Both in terms of stuff that crate users will have to learn/consider and in terms of maintenance. If it is essentially one new type and method that is a wrapper over our existing stuff, with a little unsafe and uninitialized memory, then that seems reasonable. If it is pretty invasive, maybe not...

Those are the things that I would look for when investigating copyless and its API. Maybe you can help do some of that investigation before jumping straight to a PR?

@TethysSvensson
Copy link
Contributor Author

  • The API offered by copyless is not guaranteed to always avoid memcpys. It is a best-effort solution that seems to work well on release mode, but does not work on debug mode. I do not think it is possible to offer such a guarantee in current rust (at least until we get something similar to placement new).

  • It is indeed not very invasive.

I've played around with it for a bit using three tests on release: allocating a large array, a large struct and a large enum. All of these have been defined to be too large to keep on the stack without causing a panic because of stack overflow.

I have found many good solutions that keep most of these scenarios off the stack, thus avoiding the panic. However I have not yet been able to find a solution that passes all three tests.

Would you still be interested in an API that is not guaranteed -- and indeed have known failure modes?

@TethysSvensson
Copy link
Contributor Author

By tweaking the code a bit, I managed to find a version that passes all of my tests. I would consider it good enough for my own uses.

I am still of the opinion that it is impossible to guarantee the optimization without language support such as something similar to placement new . If we want an API that is guaranteed to work, we should return a *mut T and make the user write to it manually -- and even this is not without pitfalls.

My implementation adds a single new method alloc_with. This method takes closure instead of a value, but otherwise it is mostly identical to the old alloc. The only real difference is that the ptr::write and closure invocation has been move into a separate function to make the optimization as consistent as possible. I've then re-written alloc to simple delegate to alloc_with.

If you are still interested let me know, and I will prepare it for a pull request.

@fitzgen
Copy link
Owner

fitzgen commented Mar 25, 2019

I am still of the opinion that it is impossible to guarantee the optimization without language support such as something similar to placement new . If we want an API that is guaranteed to work, we should return a *mut T and make the user write to it manually -- and even this is not without pitfalls.

Agreed, that it can't be guaranteed without changes to the language. I was hoping that we could do a best-effort solution where we run memcpy-find or grep through llvm-ir or something. This way we could at least verify that we are on the happy path now, and check whether future PRs take us off the happy path. Perhaps the copyless crate has some infra we can reuse for this?

My implementation adds a single new method alloc_with. This method takes closure instead of a value, but otherwise it is mostly identical to the old alloc. The only real difference is that the ptr::write and closure invocation has been move into a separate function to make the optimization as consistent as possible. I've then re-written alloc to simple delegate to alloc_with.

This sounds great, exactly the kind of light maintenance burden I was hoping for.

If you are still interested let me know, and I will prepare it for a pull request.

With some way to (at least manually) verify whether a change takes us off LLVM's happy path or not (either a test or a script I could run manually to check) I would love to merge this functionality.

@TethysSvensson
Copy link
Contributor Author

Agreed, that it can't be guaranteed without changes to the language. I was hoping that we could do a best-effort solution where we run memcpy-find or grep through llvm-ir or something. This way we could at least verify that we are on the happy path now, and check whether future PRs take us off the happy path. Perhaps the copyless crate has some infra we can reuse for this?

My current implementation has a few tests that causes fatal runtime error: stack overflow unless LLVM manages to optimize it correctly. They have been disabled when debug_assertions are set (as I don't know of another way of checking if it is a release build).

With some way to (at least manually) verify whether a change takes us off LLVM's happy path or not (either a test or a script I could run manually to check) I would love to merge this functionality.

I'll prepare the PR then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants