Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessed distinct grouping mode #1342

Merged
merged 7 commits into from
May 4, 2022
Merged

Preprocessed distinct grouping mode #1342

merged 7 commits into from
May 4, 2022

Conversation

mathemancer
Copy link
Contributor

Fixes #414
Fixes #411
Related to #429

This adds a preprocessor argument that can be given with the distinct grouping mode. When given, this argument defines a preprocessing function for each column before the distinct grouping is applied. In particular, this allows:

  • grouping by email domain
  • grouping by URI authority
  • grouping by URI scheme

Technical details

This paradigm (i.e., preprocessing before applying a given grouping) can also result in other exotic grouping modes, for example grouping by email domain prefix.

The preprocessor list argument (preproc) must be a list of function id strings the same length as the given list of grouping columns. If no preprocessing is desired for a given column, that place in the list should be null.

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the master branch of the repository
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no
    visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@mathemancer
Copy link
Contributor Author

This is blocked by

#1312 and #1314 . It's marked as a draft until those are merged.

@dmos62 dmos62 self-assigned this May 4, 2022
@dmos62 dmos62 marked this pull request as ready for review May 4, 2022 15:39
@dmos62 dmos62 requested review from a team and kgodey and removed request for a team May 4, 2022 15:39
@kgodey kgodey requested review from dmos62 and removed request for kgodey May 4, 2022 16:24
@kgodey kgodey added the pr-status: review A PR awaiting review label May 4, 2022
@mathemancer mathemancer merged commit 290bb16 into master May 4, 2022
@mathemancer mathemancer deleted the preproc_grouping branch May 4, 2022 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-status: review A PR awaiting review
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Implement grouping options for the URI type Implement grouping options for the email type
3 participants