Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add group by column stage #1699

Merged

Conversation

dagardner-nv
Copy link
Contributor

@dagardner-nv dagardner-nv commented May 13, 2024

Description

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

@dagardner-nv dagardner-nv added non-breaking Non-breaking change improvement Improvement to existing functionality skip-ci Optionally Skip CI for this PR labels May 13, 2024
@dagardner-nv dagardner-nv self-assigned this May 13, 2024
@dagardner-nv dagardner-nv requested a review from a team as a code owner May 13, 2024 22:25
@dagardner-nv dagardner-nv marked this pull request as draft May 13, 2024 22:25
@dagardner-nv dagardner-nv removed the skip-ci Optionally Skip CI for this PR label May 16, 2024
@dagardner-nv dagardner-nv marked this pull request as ready for review May 16, 2024 19:41
Copy link
Contributor

@AnuradhaKaruppiah AnuradhaKaruppiah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue# for this change? I would like to understand why it is being done.

@dagardner-nv
Copy link
Contributor Author

Is there an issue# for this change? I would like to understand why it is being done.

The purpose of this stage is to ensure that the rest of the downstream stages in the pipeline get a unique message for each unique value in the column being grouped-by.

The idea is that this would improve performance. In addition to this, if we know that each value in column 'A' has the same value, we can drop the column and store that value in the ControlMessage.metadata field.

I opened issue #1709 for this, but unfortunately I didn't create an issue prior to working on this feature.

@AnuradhaKaruppiah
Copy link
Contributor

LGTM, some minor suggestions.

@AnuradhaKaruppiah
Copy link
Contributor

/merge

@dagardner-nv
Copy link
Contributor Author

test

@AnuradhaKaruppiah
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 61ed7c3 into nv-morpheus:branch-24.06 Jun 4, 2024
13 checks passed
@dagardner-nv dagardner-nv deleted the david-group-by-column_stage branch June 4, 2024 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement to existing functionality non-breaking Non-breaking change
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA]: Add a group by stage
2 participants