-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-2717: Fix UB in ZigZag encoding (pre C++-20) #2744
Conversation
} | ||
AVRO_DECL constexpr int64_t decodeZigzag64(uint64_t input) noexcept { | ||
return static_cast<int64_t>(((input >> 1) ^ -(static_cast<int64_t>(input) & 1))); | ||
} | ||
|
||
AVRO_DECL constexpr uint32_t encodeZigzag32(int32_t input) noexcept { | ||
// cppcheck-suppress shiftTooManyBitsSigned | ||
return ((input << 1) ^ (input >> 31)); | ||
return (static_cast<uint32_t>(input) << 1) ^ (input >> 31); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input >> 31
is still implementation-defined on C++ < 20, but it is not undefined, and I'm not aware of any implementation that defines it differently from what C++20 requires.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm yeah true. The same would apply to input >> 63
in encodeZigzag64
though, right?
Would it worth casting them to uint32_t
/uint32_t
respectively?
Also FWIW the original PR was a backport from ClickHouse fork of Avro. And their patch just left these right shifts as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it worth casting them to uint32_t/uint32_t respectively?
No, return (static_cast<uint32_t>(input) << 1) ^ (static_cast<uint32_t>(input) >> 31);
would return the wrong value. It would be just a bitwise rotation rather than zigzag.
If you want to make it fully portable, you can replace (input >> 31)
with (input < 0 ? -1 : 0)
. I expect current compilers will emit the same code for both (try in Compiler Explorer). But I think it's OK to keep using (input >> 31)
until somebody reports that it doesn't work with some compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it, then.
Thank you, @mkmkme ! |
What is the purpose of the change
This PR fixes an undefined behaviour in ZigZag encoding as per AVRO-2717. It supersedes #920
Verifying this change
This change is a trivial rework / code cleanup without any test coverage.
Documentation