Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix vectorized ranges::find with unreachable_sentinel to properly mask the beginning and handle unaligned pointers #4450

Merged
merged 4 commits into from
Mar 8, 2024

Conversation

StephanTLavavej
Copy link
Member

@StephanTLavavej StephanTLavavej commented Mar 5, 2024

Fixes #4449.

For unsized ranges::find, vector_algorithms.cpp reads elements "before the beginning", although (hopefully) not in a way that annoys memory page protection or ASAN. Then we "we mask out matches that don't belong to the range", pretending that they weren't there. This allows us to use vectorized loads for the entire unbounded loop, instead of starting with classic scalar code until we get to a nicely-aligned boundary.

However, we had a control flow bug. We started with "load" a vector chunk, "mask" away matches before the beginning, "check" if we found anything (and return if so). Then we started our infinite loop: for (;;) { load; check; advance; }. But this repeated the initial load and check, without the mask! It was also unnecessary extra work.

A minimal correctness fix would be to cycle around the "advance" step: load; mask; check; for (;;) { advance; load; check; }. But we can do even better - the "check" steps are exactly identical between the first part and the infinite loop. So we can cycle the loop a bit more: load; mask; for (;;) { check; advance; load; } avoids having to repeat the "check". (The "load" steps are also exactly identical, but there's no easy way to fuse them, given that we want to mask only the first one.)

Also fixes #4454.

Unlike the other vectorized algorithms, find-unsized uses aligned loads, so it requires that its N-byte elements are N-aligned. This is notoriously untrue on x86, where 8-byte elements can appear on a 4-aligned stack. Packed structs can also subvert this assumption.

We can simply test whether the pointer is properly aligned. I was able to fix this with the following control flow:

#ifndef _M_ARM64EC
if (unaligned) {
    // use the scalar fallback below
} else if (_Use_avx2()) {
    always return AVX2 result;
} else if (_Traits::_Sse_available()) {
    always return SSE-n result;
}
#endif // !_M_ARM64EC
return scalar fallback;

Note that (unlike some other vectorized algorithms) our AVX2 and SSE-n codepaths here are always-return, so chaining them with else if is fine. I thought this was less disruptive than increasing the level of control flow nesting.

@StephanTLavavej StephanTLavavej added bug Something isn't working ranges C++20/23 ranges labels Mar 5, 2024
@StephanTLavavej StephanTLavavej requested a review from a team as a code owner March 5, 2024 05:06
@StephanTLavavej StephanTLavavej self-assigned this Mar 6, 2024
@StephanTLavavej
Copy link
Member Author

I'm speculatively mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej changed the title Fix vectorized ranges::find with unreachable_sentinel to properly mask the beginning Fix vectorized ranges::find with unreachable_sentinel to properly mask the beginning and handle unaligned pointers Mar 6, 2024

for (;;) {
_Data = _mm256_load_si256(static_cast<const __m256i*>(_First));
_Bingo = static_cast<unsigned int>(_mm256_movemask_epi8(_Traits::_Cmp_avx(_Data, _Comparand)));

if (_Bingo != 0) {
unsigned long _Offset = _tzcnt_u32(_Bingo);
_Advance_bytes(_First, _Offset);
return _First;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I observe that since we have only one return from AVX2 code path, we can remove _Zeroupper_on_exit _Guard; and put explicit _mm256_zeroupper(); here. Sure even if it is worth doing, can be done as a follow-up

@StephanTLavavej StephanTLavavej merged commit 9c40b48 into microsoft:main Mar 8, 2024
35 checks passed
@StephanTLavavej StephanTLavavej deleted the it-unreaches-out branch March 8, 2024 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ranges C++20/23 ranges
Projects
Archived in project
3 participants