Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-1830 [Perl] Support containers without codec #2965

Merged
merged 1 commit into from
Jun 24, 2024

Conversation

jjatria
Copy link
Contributor

@jjatria jjatria commented Jun 21, 2024

As the specification on Object Container Files states (emphasis added):

All metadata properties that start with "avro." are reserved. The following file metadata properties are currently used:

  • avro.schema contains the schema of objects stored in the file, as JSON data (required).
  • avro.codec the name of the compression codec used to compress blocks, as a string. Implementations are required to support the following codecs: "null" and "deflate". If codec is absent, it is assumed to be "null". The codecs are described with more detail below.

This change makes it so that the Perl implementation does not die when opening a container that does not contain an explicit codec in its metadata.

This change is inspired by one originally submitted in 2016 by SK Liew and tracked in https://issues.apache.org/jira/browse/AVRO-1830.

Verifying this change

This change added a new test case to t/04_datafile.t and will be verified as part of the normal test suite. The test attempts to read data from a manually crafted container that explicitly does not contain a codec. The file was manually confirmed to be readable by utilities like avrocat, which shows it to be a valid file outside the specific test case.

Documentation

  • Does this pull request introduce a new feature? Yes: the ability to read containers without a codec.
  • If yes, how is the feature documented? It is mentioned on the change log

Unfortunately, the Perl library is severely underdocumented, so there is nothing to add this to.

@martin-g
Copy link
Member

There are conflicts in the CHANGES file but I cannot resolve them because we (the maintainers) do not have permissions to push to your branch.

As [the specification on Object Container Files][spec] states
(emphasis added):

> All metadata properties that start with "avro." are reserved.
> The following file metadata properties are currently used:
>
>   * **avro.schema** contains the schema of objects stored in
>     the file, as JSON data (required).
>   * **avro.codec** the name of the compression codec used to
>     compress blocks, as a string. Implementations are required
>     to support the following codecs: "null" and "deflate". _If
>     codec is absent, it is assumed to be "null"_. The codecs are
>     described with more detail below.

This change makes it so that the Perl implementation does not die
when opening a container that does not contain an explicit codec
in its metadata.

This change is inspired by one originally submitted in 2016 by
SK Liew.

[spec]: https://avro.apache.org/docs/1.11.1/specification/#object-container-files
@jjatria
Copy link
Contributor Author

jjatria commented Jun 24, 2024

@martin-g: I've rebased this on main and force pushed without the conflict 👍

@martin-g martin-g merged commit e62c8ee into apache:main Jun 24, 2024
6 checks passed
@martin-g
Copy link
Member

Thank you, @jjatria !

@jjatria jjatria deleted the avro-1830-default-codec branch June 24, 2024 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants