Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-2717: Fix UB in ZigZag encoding (pre C++-20) #2744

Merged
merged 1 commit into from
Feb 19, 2024

Conversation

mkmkme
Copy link
Contributor

@mkmkme mkmkme commented Feb 17, 2024

What is the purpose of the change

This PR fixes an undefined behaviour in ZigZag encoding as per AVRO-2717. It supersedes #920

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Documentation

  • Does this pull request introduce a new feature? no

@github-actions github-actions bot added the C++ Pull Requests for C++ binding label Feb 17, 2024
}
AVRO_DECL constexpr int64_t decodeZigzag64(uint64_t input) noexcept {
return static_cast<int64_t>(((input >> 1) ^ -(static_cast<int64_t>(input) & 1)));
}

AVRO_DECL constexpr uint32_t encodeZigzag32(int32_t input) noexcept {
// cppcheck-suppress shiftTooManyBitsSigned
return ((input << 1) ^ (input >> 31));
return (static_cast<uint32_t>(input) << 1) ^ (input >> 31);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input >> 31 is still implementation-defined on C++ < 20, but it is not undefined, and I'm not aware of any implementation that defines it differently from what C++20 requires.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm yeah true. The same would apply to input >> 63 in encodeZigzag64 though, right?
Would it worth casting them to uint32_t/uint32_t respectively?

Also FWIW the original PR was a backport from ClickHouse fork of Avro. And their patch just left these right shifts as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it worth casting them to uint32_t/uint32_t respectively?

No, return (static_cast<uint32_t>(input) << 1) ^ (static_cast<uint32_t>(input) >> 31); would return the wrong value. It would be just a bitwise rotation rather than zigzag.

If you want to make it fully portable, you can replace (input >> 31) with (input < 0 ? -1 : 0). I expect current compilers will emit the same code for both (try in Compiler Explorer). But I think it's OK to keep using (input >> 31) until somebody reports that it doesn't work with some compiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it, then.

@martin-g martin-g merged commit 071db19 into apache:main Feb 19, 2024
4 checks passed
martin-g pushed a commit that referenced this pull request Feb 19, 2024
@martin-g
Copy link
Member

Thank you, @mkmkme !

@mkmkme mkmkme deleted the mkmkme/avro-2717-fix-ub branch February 19, 2024 09:27
RanbirK pushed a commit to RanbirK/avro that referenced this pull request May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ Pull Requests for C++ binding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants