Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancements for <bitset> #3838

Merged
merged 9 commits into from
Jul 14, 2023
Merged

Conversation

achabense
Copy link
Contributor

@achabense achabense commented Jun 29, 2023

  • simplifies bool operator[](size_t) const.
  • makes some public _Members private.
  • optimizes to_string.
Benchmark
#include <bitset>

#include "benchmark/benchmark.h"

template <class _Char = char, size_t _N>
auto _to_string_bef(const std::bitset<_N>& _Bset, const _Char _Elem0 = '0',
                       const _Char _Elem1 = '1') {
  return _Bset.to_string(_Elem0, _Elem1);
}

template <class _Char = char, size_t _N>
auto _to_string_now(const std::bitset<_N>& _Bset, const _Char _Elem0 = '0',
                    const _Char _Elem1 = '1') {
  std::basic_string<_Char> _Str(_N, _Elem0);
  for (size_t _Idx = 0; _Idx < _N; ++_Idx) {
    if (_Bset[_N - 1 - _Idx]) {
      _Str[_Idx] = _Elem1;
    }
  }
  return _Str;
}

template <class _Char, size_t N>
void BM_bef(benchmark::State& state) {
  std::bitset<N> bs{static_cast<uint64_t>(state.range(0))};
  for (auto _ : state) {
    benchmark::DoNotOptimize(_to_string_bef(bs));
  }
}

template <class _Char, size_t N>
void BM_now(benchmark::State& state) {
  std::bitset<N> bs{static_cast<uint64_t>(state.range(0))};
  for (auto _ : state) {
    benchmark::DoNotOptimize(_to_string_now(bs));
  }
}

BENCHMARK(BM_bef<char, 15>)->Arg(0x3333333333333333);
BENCHMARK(BM_now<char, 15>)->Arg(0x3333333333333333);
BENCHMARK(BM_bef<char, 64>)->Arg(0x3333333333333333);
BENCHMARK(BM_now<char, 64>)->Arg(0x3333333333333333);

BENCHMARK(BM_bef<char, 15>)->Arg(0xffffffffffffffff);
BENCHMARK(BM_now<char, 15>)->Arg(0xffffffffffffffff);
BENCHMARK(BM_bef<char, 64>)->Arg(0xffffffffffffffff);
BENCHMARK(BM_now<char, 64>)->Arg(0xffffffffffffffff);

BENCHMARK(BM_bef<wchar_t, 7>)->Arg(0x3333333333333333);
BENCHMARK(BM_now<wchar_t, 7>)->Arg(0x3333333333333333);
BENCHMARK(BM_bef<wchar_t, 64>)->Arg(0x3333333333333333);
BENCHMARK(BM_now<wchar_t, 64>)->Arg(0x3333333333333333);

BENCHMARK(BM_bef<wchar_t, 7>)->Arg(0xffffffffffffffff);
BENCHMARK(BM_now<wchar_t, 7>)->Arg(0xffffffffffffffff);
BENCHMARK(BM_bef<wchar_t, 64>)->Arg(0xffffffffffffffff);
BENCHMARK(BM_now<wchar_t, 64>)->Arg(0xffffffffffffffff);

BENCHMARK_MAIN();
Result(release mode)

image

↓(Updated benchmark)

Benchmark
#include <bitset>
#include <random>
#include <array>
#include "benchmark/benchmark.h"

auto random_bits = [] {
  std::mt19937_64 rnd(12345);
  std::array<uint64_t, 100> data;
  for (auto& d : data) {
  d = rnd();
  }
  return data;
}();

template <class _Char = char, size_t _N>
auto _to_string_bef(const std::bitset<_N>& _Bset, const _Char _Elem0 = '0',
                       const _Char _Elem1 = '1') {
  return _Bset.to_string(_Elem0, _Elem1);
}

template <class _Char = char, size_t _N>
auto _to_string_now(const std::bitset<_N>& _Bset, const _Char _Elem0 = '0',
                    const _Char _Elem1 = '1') {
  std::basic_string<_Char> _Str(_N, _Elem0);
  for (size_t _Idx = 0; _Idx < _N; ++_Idx) {
    if (_Bset[_N - 1 - _Idx]) {
      _Str[_Idx] = _Elem1;
    }
  }
  return _Str;
}

template <class _Char, size_t N>
void BM_bef(benchmark::State& state) {
  for (auto _ : state) {
    for (auto bits:random_bits) {
      std::bitset<N> bs{bits};
      benchmark::DoNotOptimize(_to_string_bef(bs));
    }
  }
}

template <class _Char, size_t N>
void BM_now(benchmark::State& state) {
  for (auto _ : state) {
    for (auto bits : random_bits) {
      std::bitset<N> bs{bits};
      benchmark::DoNotOptimize(_to_string_now(bs));
    }
  }
}

BENCHMARK(BM_bef<char, 15>);
BENCHMARK(BM_now<char, 15>);
BENCHMARK(BM_bef<char, 64>);
BENCHMARK(BM_now<char, 64>);

BENCHMARK(BM_bef<wchar_t, 7>);
BENCHMARK(BM_now<wchar_t, 7>);
BENCHMARK(BM_bef<wchar_t, 64>);
BENCHMARK(BM_now<wchar_t, 64>);

BENCHMARK_MAIN();
Result(release mode)

image

@achabense achabense requested a review from a team as a code owner June 29, 2023 09:02
stl/inc/bitset Outdated Show resolved Hide resolved
@StephanTLavavej StephanTLavavej added the performance Must go faster label Jun 30, 2023
Copy link
Contributor

@AlexGuteniev AlexGuteniev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark uses very predictable pattern. That would favor branchy code.

The branchy code would be somewhat better on a predictable pattern but a lot worse on a random pattern. So for a general purpose, branchless is better.

I didn't look into compiler output but I guess this change made code branchy. Please:

  • Make the benchmark random
  • If the random benchmark shows regression, get back to branchless code.

stl/inc/bitset Show resolved Hide resolved
stl/inc/bitset Show resolved Hide resolved
@AlexGuteniev

This comment was marked as resolved.

@achabense

This comment was marked as resolved.

@AlexGuteniev

This comment was marked as resolved.

@achabense

This comment was marked as resolved.

@StephanTLavavej StephanTLavavej added decision needed We need to choose something before working on this and removed decision needed We need to choose something before working on this labels Jul 4, 2023
@StephanTLavavej

This comment was marked as resolved.

@StephanTLavavej StephanTLavavej self-assigned this Jul 7, 2023
Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll validate and push changes! One moment...

benchmarks/src/bitset_to_string.cpp Outdated Show resolved Hide resolved
benchmarks/src/bitset_to_string.cpp Outdated Show resolved Hide resolved
benchmarks/src/bitset_to_string.cpp Show resolved Hide resolved
benchmarks/src/bitset_to_string.cpp Outdated Show resolved Hide resolved
benchmarks/src/bitset_to_string.cpp Outdated Show resolved Hide resolved

template <class charT>
void BM_bitset_to_string_large_single(benchmark::State& state) {
const auto large_bitset = bit_cast<bitset<CHAR_BIT * sizeof random_bits>>(random_bits);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No change requested: Technically, I don't believe that this is Standard. [bit.cast] is constrained for trivially copyable types, but [template.bitset] doesn't guarantee that for bitset. That said, bit_cast would refuse to compile otherwise, so I think we can get away with it.

stl/inc/bitset Show resolved Hide resolved
stl/inc/bitset Show resolved Hide resolved
stl/inc/bitset Show resolved Hide resolved
stl/inc/bitset Show resolved Hide resolved
@StephanTLavavej
Copy link
Member

Thanks! I also verified the benchmark results locally. 😻

@StephanTLavavej StephanTLavavej removed their assignment Jul 11, 2023
@StephanTLavavej StephanTLavavej self-assigned this Jul 13, 2023
@StephanTLavavej
Copy link
Member

I'm speculatively mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 750ec42 into microsoft:main Jul 14, 2023
35 checks passed
@StephanTLavavej
Copy link
Member

Thanks for implementing these perf improvements and benchmarks! 📉 🎉 😺

@achabense achabense deleted the _For_bitset branch July 14, 2023 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants