Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize output vector adapter write #3569

Merged

Conversation

romainreignier
Copy link
Contributor

The first commit of this pull request adds a benchmark for the CBOR serialization in order to compare the change introduced by the second commit of the pull request.

In the output_vector_adapter class, replace the usage of std::copy + std::back_inserter by the method std::vector::insert.
The std::back_inserter generates a lot of calls to std::vector::push_back which allocate the memory on the fly.
For big datasets, usually binary data, the overhead is important.
Resizing the vector in a first step and then copying the data helps a lot.
But from my benchmarks, it appears that using the method .insert() on a vector is almost as performant and shorter to write.

I have added a benchmark similar to the .dump() for JSON but using json::to_cbor() but the performance change is not significant.
But because the application in which I have noticed a bottleneck in the use of std::vector::push_back() was serializaing a lot of binary data to cbor, I have added a benchmark serializing vectors of bytes. For this case, the change proposed is significant.

Original:

2022-07-04T14:43:24+02:00
Running ./json_benchmarks
Run on (8 X 3603.48 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 6144 KiB (x1)
Load Average: 0.43, 0.72, 0.85
-----------------------------------------------------------------------------
Benchmark                   Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------
BinaryToCbor/8            239 ns          239 ns      2949279 bytes_per_second=107.695M/s
BinaryToCbor/16           264 ns          264 ns      2701934 bytes_per_second=126.646M/s
BinaryToCbor/32           289 ns          289 ns      2419848 bytes_per_second=171.441M/s
BinaryToCbor/64           368 ns          368 ns      1915804 bytes_per_second=217.621M/s
BinaryToCbor/128          502 ns          502 ns      1000000 bytes_per_second=281.356M/s
BinaryToCbor/256          773 ns          773 ns       901424 bytes_per_second=341.569M/s
BinaryToCbor/512         1285 ns         1285 ns       546804 bytes_per_second=395.719M/s
BinaryToCbor/1024        2358 ns         2358 ns       302891 bytes_per_second=422.628M/s
BinaryToCbor/2048        4328 ns         4328 ns       162237 bytes_per_second=455.93M/s
BinaryToCbor/4096        8252 ns         8252 ns        84487 bytes_per_second=475.822M/s
BinaryToCbor/8192       15994 ns        15993 ns        43813 bytes_per_second=489.738M/s
BinaryToCbor/16384      31647 ns        31645 ns        22234 bytes_per_second=494.39M/s
BinaryToCbor/32768      63097 ns        63095 ns        11096 bytes_per_second=495.604M/s

New version with std::vector::insert():

2022-07-04T14:42:37+02:00
Running ./json_benchmarks
Run on (8 X 3786.38 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 6144 KiB (x1)
Load Average: 0.34, 0.76, 0.86
-----------------------------------------------------------------------------
Benchmark                   Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------
BinaryToCbor/8            232 ns          232 ns      3067496 bytes_per_second=111.04M/s
BinaryToCbor/16           229 ns          229 ns      3011982 bytes_per_second=145.559M/s
BinaryToCbor/32           230 ns          230 ns      3022206 bytes_per_second=215.89M/s
BinaryToCbor/64           231 ns          231 ns      3037105 bytes_per_second=346.82M/s
BinaryToCbor/128          235 ns          235 ns      2979490 bytes_per_second=601.678M/s
BinaryToCbor/256          236 ns          236 ns      2935424 bytes_per_second=1118.63M/s
BinaryToCbor/512          259 ns          259 ns      2712207 bytes_per_second=1.91596G/s
BinaryToCbor/1024         285 ns          285 ns      2489110 bytes_per_second=3.41833G/s
BinaryToCbor/2048         295 ns          295 ns      2364411 bytes_per_second=6.53373G/s
BinaryToCbor/4096         366 ns          366 ns      1925247 bytes_per_second=10.4859G/s
BinaryToCbor/8192         531 ns          531 ns      1319473 bytes_per_second=14.3938G/s
BinaryToCbor/16384       1042 ns         1042 ns       672042 bytes_per_second=14.6641G/s
BinaryToCbor/32768       1853 ns         1853 ns       380673 bytes_per_second=16.4827G/s

If you prefer an implementation with std::vector::resize() + std::memcpy() or std::copy(), the performance is equivalent.

…adapter::write_characters

This change increases a lot the performance when writing lots of binary data.
@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling bb1abe9 on romainreignier:optimize_output_vector_adapter_write into 954b10a on nlohmann:develop.

@nlohmann nlohmann self-assigned this Jul 8, 2022
Copy link
Owner

@nlohmann nlohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@nlohmann nlohmann added this to the Release 3.11.0 milestone Jul 8, 2022
@nlohmann nlohmann merged commit d4daaa8 into nlohmann:develop Jul 8, 2022
@nlohmann
Copy link
Owner

nlohmann commented Jul 8, 2022

Thanks!

1r0b1n0 pushed a commit to ixblue/rosbridge_server_cpp that referenced this pull request Jul 3, 2023
Apply the change made in the PR#3569 of the json lib
nlohmann/json#3569
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants