Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vector_algorithms.cpp, *minmax*: invert the condition to improve *_element cases a bit more #4401

Merged
merged 2 commits into from
Feb 27, 2024

Conversation

AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented Feb 17, 2024

The possibility was realized when working on #4384.

I observed that the compiler swaps branches to make the _Minmax loop more local.
Applying the same change to the _Minmax_element loop gives significant improvement: 1.6 or 1.8 times for some cases.
Also made that change to _Minmax loop to maintain consistency (with both its counterpart and how it being compiled).

Benchmark results
--------------------------------------------------------------------------------------------
Benchmark                              Old Time        Old CPU    New Time         New CPU  
--------------------------------------------------------------------------------------------
bm<uint8_t, 8021, Op::Min>              647 ns          642 ns      361 ns          361 ns  
bm<uint8_t, 8021, Op::Max>              598 ns          586 ns      368 ns          369 ns  
bm<uint8_t, 8021, Op::Both>             679 ns          684 ns      556 ns          562 ns  
bm<uint8_t, 8021, Op::Min_val>          278 ns          279 ns      272 ns          273 ns  
bm<uint8_t, 8021, Op::Max_val>          285 ns          285 ns      279 ns          276 ns  
bm<uint8_t, 8021, Op::Both_val>       20891 ns        20996 ns    20642 ns        20403 ns  
bm<uint16_t, 8021, Op::Min>            1093 ns         1099 ns      791 ns          785 ns  
bm<uint16_t, 8021, Op::Max>            1104 ns         1099 ns      659 ns          670 ns  
bm<uint16_t, 8021, Op::Both>           1595 ns         1604 ns     1050 ns         1050 ns  
bm<uint16_t, 8021, Op::Min_val>         332 ns          330 ns      326 ns          322 ns  
bm<uint16_t, 8021, Op::Max_val>         331 ns          337 ns      324 ns          322 ns  
bm<uint16_t, 8021, Op::Both_val>      12045 ns        12207 ns    11841 ns        11719 ns  
bm<uint32_t, 8021, Op::Min>            2135 ns         2093 ns     1294 ns         1287 ns  
bm<uint32_t, 8021, Op::Max>            1356 ns         1350 ns     1304 ns         1311 ns  
bm<uint32_t, 8021, Op::Both>           2117 ns         2131 ns     2092 ns         2086 ns  
bm<uint32_t, 8021, Op::Min_val>         627 ns          641 ns      618 ns          625 ns  
bm<uint32_t, 8021, Op::Max_val>        1041 ns         1050 ns     1049 ns         1050 ns  
bm<uint32_t, 8021, Op::Both_val>        666 ns          670 ns      693 ns          698 ns  
bm<uint64_t, 8021, Op::Min>            4629 ns         4604 ns     4615 ns         4604 ns  
bm<uint64_t, 8021, Op::Max>            4745 ns         4757 ns     4735 ns         4708 ns  
bm<uint64_t, 8021, Op::Both>           5065 ns         5156 ns     4925 ns         5000 ns  
bm<uint64_t, 8021, Op::Min_val>        4068 ns         4081 ns     4068 ns         4081 ns  
bm<uint64_t, 8021, Op::Max_val>        4068 ns         4081 ns     4058 ns         3990 ns  
bm<uint64_t, 8021, Op::Both_val>       4103 ns         4081 ns     4082 ns         4081 ns  
bm<int8_t, 8021, Op::Min>               551 ns          547 ns      358 ns          357 ns  
bm<int8_t, 8021, Op::Max>               578 ns          578 ns      364 ns          368 ns  
bm<int8_t, 8021, Op::Both>              708 ns          711 ns      553 ns          562 ns  
bm<int8_t, 8021, Op::Min_val>           275 ns          279 ns      268 ns          267 ns  
bm<int8_t, 8021, Op::Max_val>           274 ns          273 ns      269 ns          264 ns  
bm<int8_t, 8021, Op::Both_val>        20376 ns        20403 ns    20147 ns        19950 ns  
bm<int16_t, 8021, Op::Min>             1046 ns         1050 ns      783 ns          785 ns  
bm<int16_t, 8021, Op::Max>             1041 ns         1050 ns      675 ns          684 ns  
bm<int16_t, 8021, Op::Both>            1609 ns         1604 ns     1084 ns         1088 ns  
bm<int16_t, 8021, Op::Min_val>          535 ns          544 ns      530 ns          531 ns  
bm<int16_t, 8021, Op::Max_val>          571 ns          572 ns      545 ns          544 ns  
bm<int16_t, 8021, Op::Both_val>       13254 ns        13393 ns    13141 ns        13114 ns  
bm<int32_t, 8021, Op::Min>             2071 ns         2086 ns     1292 ns         1287 ns  
bm<int32_t, 8021, Op::Max>             1356 ns         1350 ns     1312 ns         1311 ns  
bm<int32_t, 8021, Op::Both>            2153 ns         2148 ns     2066 ns         2086 ns  
bm<int32_t, 8021, Op::Min_val>         1052 ns         1067 ns     1033 ns         1025 ns  
bm<int32_t, 8021, Op::Max_val>          622 ns          628 ns      620 ns          614 ns  
bm<int32_t, 8021, Op::Both_val>        1030 ns         1046 ns     1045 ns         1050 ns  
bm<int64_t, 8021, Op::Min>             4628 ns         4604 ns     4672 ns         4604 ns  
bm<int64_t, 8021, Op::Max>             4746 ns         4708 ns     4763 ns         4743 ns  
bm<int64_t, 8021, Op::Both>            5013 ns         5000 ns     5013 ns         5000 ns  
bm<int64_t, 8021, Op::Min_val>         4065 ns         4081 ns     4157 ns         4171 ns  
bm<int64_t, 8021, Op::Max_val>         4071 ns         4098 ns     4136 ns         4143 ns  
bm<int64_t, 8021, Op::Both_val>        4090 ns         4081 ns     4159 ns         4143 ns  
bm<float, 8021, Op::Min>               2219 ns         2197 ns     2423 ns         2459 ns  
bm<float, 8021, Op::Max>               2469 ns         2455 ns     2409 ns         2407 ns  
bm<float, 8021, Op::Both>              2558 ns         2567 ns     3202 ns         3209 ns  
bm<float, 8021, Op::Min_val>           2001 ns         1995 ns     2031 ns         1995 ns  
bm<float, 8021, Op::Max_val>           1998 ns         1995 ns     2038 ns         2040 ns  
bm<float, 8021, Op::Both_val>          2034 ns         2040 ns     2039 ns         2040 ns  
bm<double, 8021, Op::Min>              4206 ns         4262 ns     4084 ns         4081 ns  
bm<double, 8021, Op::Max>              4378 ns         4395 ns     4347 ns         4395 ns  
bm<double, 8021, Op::Both>             5433 ns         5301 ns     5139 ns         5162 ns  
bm<double, 8021, Op::Min_val>          4027 ns         3990 ns     4028 ns         3990 ns  
bm<double, 8021, Op::Max_val>          4032 ns         4081 ns     4037 ns         4081 ns  
bm<double, 8021, Op::Both_val>         4113 ns         4081 ns     4065 ns         4081 ns  

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner February 17, 2024 08:17
@StephanTLavavej StephanTLavavej added the performance Must go faster label Feb 17, 2024
@StephanTLavavej StephanTLavavej self-assigned this Feb 17, 2024
@StephanTLavavej StephanTLavavej changed the title vector_algorithm.cpp, *minmax*: inverse the condition to improve *_element cases a bit more vector_algorithms.cpp, *minmax*: invert the condition to improve *_element cases a bit more Feb 17, 2024
@StephanTLavavej StephanTLavavej removed their assignment Feb 22, 2024
@StephanTLavavej StephanTLavavej self-assigned this Feb 23, 2024
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 5897d98 into microsoft:main Feb 27, 2024
35 checks passed
@StephanTLavavej
Copy link
Member

Maximum warp! 🛸 🚀 ⚡

@AlexGuteniev AlexGuteniev deleted the flow branch February 27, 2024 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants