Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for finding duplicate rows #1028

Closed
Tracked by #442
kgodey opened this issue Jan 26, 2022 · 4 comments
Closed
Tracked by #442

Improve support for finding duplicate rows #1028

kgodey opened this issue Jan 26, 2022 · 4 comments
Labels
needs: requirements The problem is clear and worth solving, but we're not yet sure of the best solution needs: ux design type: enhancement New feature or request work: frontend Related to frontend code in the mathesar_ui directory
Milestone

Comments

@kgodey
Copy link
Contributor

kgodey commented Jan 26, 2022

Problem

We have implemented a filter to identify duplicate rows. However, after implementing it, we've realized that it doesn't make sense as a filter like our other filters.

Our other filters work the same regardless of the order in which they are applied. This is not the case for "has duplicates". Depending on the other filters applied, we may end up returning different results and confusing the user. We don't face this problem with any other filter because other filters only rely on the data in a specific row. "Has duplicates" relies on all of the rows visible.

As an example, here's a table:

ID Title Release Year Favorite
1 Dante's Peak 1997 FALSE
2 Forrest Gump 1994 TRUE
3 The Karate Kid 1984 TRUE
4 Dante's Peak 1997 TRUE
5 The Karate Kid 1984 TRUE

Order 1

Imagine the user applies filters in this order: "Year" > 1993, "Favorite" is TRUE, "Title" has duplicates

Filter 1: "Year" > 1993

ID Title Release Year Favorite
1 Dante's Peak 1997 FALSE
2 Forrest Gump 1994 TRUE
4 Dante's Peak 1997 TRUE

Filter 2: "Favorite" is TRUE

ID Title Release Year Favorite
2 Forrest Gump 1994 TRUE
4 Dante's Peak 1997 TRUE

Filter 3:"Title" has duplicates

0 results.

Order 2

But instead if the user applies filter in this order "Year" > 1993,"Title" has duplicates and "Favorite" is TRUE

Filter 1: "Year" > 1993

ID Title Release Year Favorite
1 Dante's Peak 1997 FALSE
2 Forrest Gump 1994 TRUE
4 Dante's Peak 1997 TRUE

Filter 2:"Title" has duplicates

ID Title Release Year Favorite
1 Dante's Peak 1997 FALSE
4 Dante's Peak 1997 TRUE

Filter 3: "Favorite" is TRUE

ID Title Release Year Favorite
4 Dante's Peak 1997 TRUE

Proposed solution

A few ideas:

  • If we want "has duplicates" to be a filter, we need to introduce the concept of ordering and reordering filters and explaining to the user how their previous filters affect future filters. This seems like a complicated concept to introduce for the alpha release so I think it would be better to avoid this solution.
  • We could move the functionality to check for duplicates to some other part of the table/view. This could be extended to other functionality in the future – we could support more complicated database queries through this interface.
  • We could move the functionality to the Query Builder (see Design for visual query builder ("Data Explorer") #1065)
    • Please note that this needs to be ordered with other steps in the query builder to work properly.

Design notes

  • Please note that this also involves updating the user flow when the user tries to set a UNIQUE constraint on a column and they have non-unique values to be consistent with the new design.

Additional context

@kgodey kgodey added type: bug Something isn't working status: triage labels Jan 26, 2022
@kgodey kgodey added this to the [09] Working with Views milestone Jan 26, 2022
@kgodey kgodey added restricted: design team ready Ready for implementation and removed status: triage labels Jan 26, 2022
@ghislaineguerin ghislaineguerin self-assigned this Jan 27, 2022
@dmos62
Copy link
Contributor

dmos62 commented Jan 28, 2022

I'll reiterate how I see a transformation like duplicates-only working in the distant future.

In general, to see only duplicates, you have to perform these steps:

  • Look at which rows are unique on some column
  • Group them together
  • Introduce a new column containing the group-size of that row
  • Filter based on whether the group-size is more than 1

At the moment, we are encapsulating all of the above steps into a single monolithic transformation and calling it duplicates_only.

What I'd like to see happening in the future is that we give the user the general tools to perform above listed operations and he can compose them into the exact above sequence and get the same result. That does require/imply this pipeline-type interface that we've been talking about lately.

@kgodey kgodey added needs: unblocking Blocked by other work and removed ready Ready for implementation labels Feb 15, 2022
@kgodey
Copy link
Contributor Author

kgodey commented Feb 15, 2022

This is blocked by #1065 since we'll probably be using the query builder to identify duplicate rows.

@seancolsen
Copy link
Contributor

I'm marking this as unblocked now that #1065 is completed.

@seancolsen seancolsen added ready Ready for implementation type: enhancement New feature or request and removed needs: unblocking Blocked by other work type: bug Something isn't working labels Sep 1, 2022
@github-actions github-actions bot added the stale label Apr 12, 2023
@rajatvijay rajatvijay removed the stale label Apr 17, 2023
@seancolsen seancolsen added needs: ux design work: frontend Related to frontend code in the mathesar_ui directory and removed ready Ready for implementation work: design labels Dec 5, 2023
@seancolsen seancolsen changed the title Update design for checking for duplicate rows. Improve support for finding duplicate rows Dec 6, 2023
@seancolsen seancolsen added the needs: requirements The problem is clear and worth solving, but we're not yet sure of the best solution label Dec 6, 2023
@mathesar-foundation mathesar-foundation deleted a comment from github-actions bot Dec 6, 2023
@kgodey
Copy link
Contributor Author

kgodey commented Feb 2, 2024

Closing this, too old, requirements are likely out of date. We can create a new issue if we need this functionality in the future.

@kgodey kgodey closed this as not planned Won't fix, can't repro, duplicate, stale Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs: requirements The problem is clear and worth solving, but we're not yet sure of the best solution needs: ux design type: enhancement New feature or request work: frontend Related to frontend code in the mathesar_ui directory
Projects
No open projects
Development

No branches or pull requests

5 participants