Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-2397: [c++] Add support for type and field aliases #2270

Merged
merged 2 commits into from
Feb 22, 2024

Conversation

mparry
Copy link
Contributor

@mparry mparry commented Jun 5, 2023

What is the purpose of the change

This adds support to the C++ library for type and field aliases. It addresses AVRO-2397 and also AVRO-1496.

It is an evolution of the old PR #522. That sought to handle support by overriding the Name equality operator and also using Name for field names too. This struck me as too broad a change (do we want that notion of equality everywhere?) and not really appropriate for field aliases (where there is no namespace).

The handling is now in two places:

  1. In ResolvingGrammarGenerator, writer types are now matched via the new equalOrAliasedBy() method
  2. Field aliases are handled simply by allowing them to be looked up via nameIndex() - i.e. aliases are added there too, as well as the original names.

Verifying this change

This change added tests and can be verified by running the C++ unit tests, specifically CodecTests.

Documentation

  • Does this pull request introduce a new feature? Yes.
  • If yes, how is the feature documented? Currently, I have made no documentation changes, as I'm not sure that it's stated anywhere that aliases are not supported. If that's the case, it should be removed / reversed.

@github-actions github-actions bot added the C++ Pull Requests for C++ binding label Jun 5, 2023
@@ -263,7 +266,7 @@ static GenericDatum makeGenericDatum(NodePtr n,
static const std::unordered_set<std::string>& getKnownFields() {
// return known fields
static const std::unordered_set<std::string> kKnownFields =
{"name", "type", "default", "doc", "size", "logicalType",
{"name", "type", "aliases", "default", "doc", "size", "logicalType",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well as being necessary for this change, adding "aliases" to this list fixes what is - I would argue - a regression in master vs 1.11 where any type or field aliases would now result in an error. This is because the custom attribute handling would not recognise it as a known field and would then try and read a string value. This fails because the aliases value is, of course, an array of strings. I don't know if there are any other features in the specification that are currently unsupported and would be affected similarly. I think that you should make this change even if you don't accept the rest of the PR.

std::string ns_;
std::string simpleName_;
std::unique_ptr<Aliases> aliases_;
Copy link
Contributor Author

@mparry mparry Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, I simply had

std::vector<std::string> aliases_;
std::set<std::string> fqAliases_;

here but there is something peculiar going on when built under GCC 11 (as used in the GitHub action runner). Any addition of a container member here, with no other code change, causes some lifetime issues with the weak_ptr reference in NodeSymbolic that breaks certain avrogencpp tests. I have to suspect some issue with libstdc++ 11.x because it's fine when built with Clang or if I take this new approach instead.

Arguably, this is nicer anyway since it minimises overhead for the majority of Names that have no aliases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-g - if you could approve the workflows again, the tests should pass this time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@mparry mparry changed the title AVRO-2397: [c++] Added support for type and field aliases AVRO-2397: [c++] Add support for type and field aliases Jun 6, 2023
@KepptnKool
Copy link

What's the state of this pull request. Is the feature planned for one of the next releases?

@martin-g
Copy link
Member

martin-g commented Feb 5, 2024

@KepptnKool There is no active maintainer of the C/C++ SDKs :-/

Have you reviewed and tested the proposed changes ?
I could merge the PR if at least two users/contributors approve it!

lang/c++/api/Node.hh Outdated Show resolved Hide resolved
@martin-g
Copy link
Member

@mkmkme Could you please review this PR when you have time ? Thank you!

Copy link
Contributor

@mkmkme mkmkme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, added one nit.

@@ -1264,6 +1264,46 @@ static const TestData3 data3[] = {

{R"(["boolean", "int"])", "U1I", R"(["boolean", "long"])", "U1L", 1},
{R"(["boolean", "int"])", "U1I", R"(["long", "boolean"])", "U0L", 1},

// Aliases
{"{\"type\":\"record\", \"name\":\"r\", \"fields\":["
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use raw string literals to improve readability of the test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use raw string literals to improve readability of the test?

I am going to merge the PR!
@mkmkme You could make this improvement in a follow-up!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, will do!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-g martin-g merged commit a462498 into apache:main Feb 22, 2024
4 checks passed
@martin-g
Copy link
Member

Thank you all!

RanbirK pushed a commit to RanbirK/avro that referenced this pull request May 13, 2024
* Add support for type and field aliases

* AVRO-2397: Improve code style consistency

---------

Co-authored-by: Martin Grigorov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ Pull Requests for C++ binding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants