Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ntuple] Multiple column representations: foundation #16054

Merged
merged 18 commits into from
Jul 25, 2024

Conversation

jblomer
Copy link
Contributor

@jblomer jblomer commented Jul 18, 2024

Provides the foundations for multiple, alternative column representations of a single field. Extends the binary format to store multiple sets of columns per field and handles (de-)serialization and the descriptor representation (backwards compatible).

A follow-up PR will use this capability from the RField/RColumn classes.

Copy link

github-actions bot commented Jul 18, 2024

Test Results

    13 files      13 suites   2d 20h 43m 6s ⏱️
 2 677 tests  2 677 ✅ 0 💤 0 ❌
32 708 runs  32 708 ✅ 0 💤 0 ❌

Results for commit 6205bac.

♻️ This comment has been updated with latest results.

@jblomer jblomer force-pushed the ntuple-multi-column-reps-v3 branch 2 times, most recently from c305ec2 to ea017ad Compare July 19, 2024 14:23
Copy link
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall design LGTM, some comments on details of the implementation

tree/ntuple/v7/doc/specifications.md Show resolved Hide resolved
Comment on lines 894 to 976
if (fDescriptor.FindLogicalColumnId(fieldId, 0, representationIndex - 1) == kInvalidDescriptorId)
return R__FAIL("out of bounds representation index");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we maybe want to check here that the previous column representations has all of its columns added? ie loop over column = 0 .. GetColumnCardinality()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the next comment: I slightly patched the sanity checks. The result should be the same: when the descriptor is built, we ensure that all column representations have the same number of columns, and that the column index and the representation index are consecutive, starting at 0.

Comment on lines 739 to 802
for (auto firstColumnIdx = 0u; firstColumnIdx < nColumns; firstColumnIdx += columnCardinality) {
const auto &columnDesc = fDescriptor.GetColumnDescriptor(logicalColumnIds[firstColumnIdx]);
if (columnDesc.GetRepresentationIndex() != firstColumnIdx / columnCardinality) {
return R__FAIL("field with id '" + std::to_string(fieldId) + "' has irregular column representations");
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably just check this for all columns, not only the first of each representation.

tree/ntuple/v7/src/RNTupleDescriptor.cxx Outdated Show resolved Hide resolved
tree/ntuple/v7/inc/ROOT/RNTupleSerialize.hxx Show resolved Hide resolved
Comment on lines 184 to 188
std::uint64_t GetFirstElementIndex() const { return fFirstElementIndex; }
std::uint64_t GetFirstElementIndex() const { return std::abs(fFirstElementIndex); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect to call this function for suppressed columns? Maybe add an assertion of not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is called also for suppressed columns. E.g. in the RNTupleSerializer.

tree/ntuple/v7/inc/ROOT/RNTupleDescriptor.hxx Show resolved Hide resolved
@jblomer jblomer force-pushed the ntuple-multi-column-reps-v3 branch 3 times, most recently from 8a7f155 to 6205bac Compare July 23, 2024 14:14
@jblomer jblomer requested a review from hahnjo July 23, 2024 14:18
Copy link
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my point of view; one minor comment, but I may be missing why that case is important...

Comment on lines 794 to 795
if (nColumns <= columnCardinality)
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this an error as well because it means the field doesn't have all columns? On the other hand, the previous condition allowed nColumns == 0, but shouldn't this already be taken care of by columnCardinality == 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition triggers if we have only a single column representation. I this case, by construction we have nColumns + 1 == columnCardinality. Perhaps not the clearest way to express that in the current if

@jblomer jblomer force-pushed the ntuple-multi-column-reps-v3 branch from 6205bac to 29cab72 Compare July 25, 2024 09:32
@jblomer jblomer merged commit b7bce81 into root-project:master Jul 25, 2024
17 checks passed
@jblomer jblomer deleted the ntuple-multi-column-reps-v3 branch July 25, 2024 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants