Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provides implementation of Vec::extend_from_slice optimized for T: Copy #236

Merged
merged 8 commits into from
Feb 21, 2024

Conversation

zslayton
Copy link
Contributor

@zslayton zslayton commented Feb 18, 2024

Addresses #235.

This PR is meant to be a jumping-off point for discussion. It adds a new inherent impl method to Vec:

impl<'bump, T: 'bump + Copy> Vec<'bump, T> {
    /// ...
    pub fn extend_from_slice_copy(&mut self, other: &[T]) {
    // ...
    }
}

It moves the logic added in #229 from String to Vec and then re-implements push_str by calling the new extend_from_slice_copy.

I've duplicated the push_str benchmarks but for Vec<'_, u8> and the performance improves as expected:
image

@zslayton
Copy link
Contributor Author

@overlookmotel expressed a concern that unconditionally using copy_nonoverlapping might result in small performance regressions in the case of a small slices/strings.

I extended the benchmarks in this PR to test a variety of sizes between 4 bytes and 16KB. The results--in this microbenchmark, running on my machine--indicated that copy_nonoverlapping is a win for slices of any size.

image
image

@overlookmotel
Copy link
Contributor

That's interesting! I wonder why we saw a slow-down in OXC (though an extremely small one) with the other change.

The other explanations I can think of are:

  1. criterion::black_box is not doing its job properly in your benchmarks, which would allow the compiler to convert the copy_nonoverlapping calls to inlined copies.
  2. It performs better with slice lengths which are multiples of 8 / 16 etc (seems plausible). Maybe try a "weird" length like 5 or 7 and see how it does with that?

Sorry, not trying to pick holes in your work, just trying to understand.

Copy link
Owner

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me modulo nitpick below.

Regarding short strings and possible perf regressions: nothing I've read seems serious enough to warrant reverting that optimization. I'm happy to receive follow up PRs that further improve things here, if there is anything further to improve.

.gitignore Outdated
Comment on lines 3 to 5

# JetBrains IDE files (e.g. RustRover)
.idea
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something that is more appropriate for your home directory's .gitignore than every project you contribute to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 TIL that you can put a .gitignore in your home directory. I'll remove this, thanks.

I also noticed that Vec's impl of std::io::Write can be updated to use this method; I'll push another commit with that change tomorrow.

@overlookmotel
Copy link
Contributor

overlookmotel commented Feb 20, 2024

Regarding short strings and possible perf regressions: nothing I've read seems serious enough to warrant reverting that optimization. I'm happy to receive follow up PRs that further improve things here, if there is anything further to improve.

Yes absolutely, the performance "regression" we saw was absolutely tiny, and for general purpose use 80x faster for long strings outweighs 0.1% slower for very short strings. I just wanted to put it on the record that there was some downside to the PR I made, rather than leave it unsaid, in case it affects anyone else.

@zslayton
Copy link
Contributor Author

It performs better with slice lengths which are multiples of 8 / 16 etc (seems plausible). Maybe try a "weird" length like 5 or 7 and see how it does with that?

This seemed plausible to me too! I added a few non-power-of-2 cases to the benchmark to check it out, but it looks like the improvement holds for those as well.

image
image

Copy link
Owner

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@fitzgen fitzgen merged commit 54c88f0 into fitzgen:main Feb 21, 2024
8 checks passed
@overlookmotel
Copy link
Contributor

@zslayton Thanks for tolerating my nitty comments, and for benching shorter byte ranges. I remain curious about why we saw the slight slow-down we did on similar PR for strings, but probably I should stop worrying!

@zslayton
Copy link
Contributor Author

@zslayton Thanks for tolerating my nitty comments, and for benching shorter byte ranges.

Sure thing! I appreciate having another set of eyes on this, I'd rather spend some time double-checking things than accidentally cause a regression for all of these folks 🙃😬.

I remain curious about why we saw the slight slow-down we did on similar PR for strings, but probably I should stop worrying!

I'm curious too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants