Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to only act on duplicate messages #204

Closed
wants to merge 1 commit into from
Closed

Conversation

djwf
Copy link

@djwf djwf commented Mar 4, 2021

Summary

This PR fixes/implements the following bugs/features:

  • Adds an option to ignore single messages during execution, but only when selected. The default behavior is the same as before this change.

This PR fixes #203.

Preliminary checks

New Features Submissions:

  1. Does your submission pass tests?
  2. Have you lint your code locally prior to submission?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

I'm not sure what the tests should look like, and will take any suggestions for those.

@codecov
Copy link

codecov bot commented Mar 4, 2021

Codecov Report

Merging #204 (df445a9) into main (7ba6bc3) will decrease coverage by 0.32%.
The diff coverage is 25.00%.

❗ Current head df445a9 differs from pull request most recent head 73629a4. Consider uploading reports for the commit 73629a4 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #204      +/-   ##
==========================================
- Coverage   77.79%   77.47%   -0.33%     
==========================================
  Files           8        8              
  Lines         725      728       +3     
  Branches      117      119       +2     
==========================================
  Hits          564      564              
- Misses        126      128       +2     
- Partials       35       36       +1     
Impacted Files Coverage Δ
mail_deduplicate/__init__.py 93.33% <ø> (ø)
mail_deduplicate/deduplicate.py 73.54% <14.28%> (-1.19%) ⬇️
mail_deduplicate/cli.py 86.04% <100.00%> (+0.16%) ⬆️
mail_deduplicate/mail.py 69.65% <0.00%> (-0.21%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ba6bc3...73629a4. Read the comment docs.

@djwf djwf mentioned this pull request Mar 4, 2021
3 tasks
@kdeldycke
Copy link
Owner

Thanks @djwf for the contribution. This change is welcome and I'm ready to merge it upstream.

But in light of upcoming extensive refactors (I'm thinking about local cache implementation from #87), it's better to protect this change by a set of unittest. That way your intention in implementing this option flag will not be lost in the future.

As for the tests, you can start by mimicking the usage of the CLI from #203 and #146. As your contribution is proposing to solve these issues, you can simulate the cases users have reported there.

@djwf
Copy link
Author

djwf commented Mar 15, 2021

Got it - I'll get some written up and submit so you can check them over: thanks!

@DuBistKomisch
Copy link

Happy to report that I successfully deleted duplicates with this by running mdedup -o -s select-all-but-one -a delete-selected, although I got an AssertionError at the end, I guess since the stats become a bit wonky:

Details
● Phase #4 - Report and statistics
╒════════════╤══════════╤══════════════════════════════════════════════════════════════╕
│ Mails      │   Metric │ Description                                                  │
╞════════════╪══════════╪══════════════════════════════════════════════════════════════╡
│ Found      │     6995 │ Total number of mails encountered from all mail sources.     │
├────────────┼──────────┼──────────────────────────────────────────────────────────────┤
│ Rejected   │        0 │ Number of mails rejected individually because they were      │
│            │          │ unparseable or did not have enough metadata to compute       │
│            │          │ hashes.                                                      │
├────────────┼──────────┼──────────────────────────────────────────────────────────────┤
│ Retained   │     6995 │ Number of valid mails parsed and retained for deduplication. │
├────────────┼──────────┼──────────────────────────────────────────────────────────────┤
│ Hashes     │     4814 │ Number of unique hashes.                                     │
├────────────┼──────────┼──────────────────────────────────────────────────────────────┤
│ Unique     │     3612 │ Number of unique mails (which where automatically added to   │
│            │          │ selection unless --only-act-on-duplicates was passed on the  │
│            │          │ command line).                                               │
├────────────┼──────────┼──────────────────────────────────────────────────────────────┤
│ Duplicates │     3383 │ Number of duplicate mails (sum of mails in all duplicate     │
│            │          │ sets with at least 2 mails).                                 │
├────────────┼──────────┼──────────────────────────────────────────────────────────────┤
│ Skipped    │       23 │ Number of mails ignored in the selection phase because the   │
│            │          │ whole set they belong to was skipped.                        │
├────────────┼──────────┼──────────────────────────────────────────────────────────────┤
│ Discarded  │     1194 │ Number of mails discarded from the final selection.          │
├────────────┼──────────┼──────────────────────────────────────────────────────────────┤
│ Selected   │     2166 │ Number of mails kept in the final selection on which the     │
│            │          │ action will be performed.                                    │
├────────────┼──────────┼──────────────────────────────────────────────────────────────┤
│ Copied     │        0 │ Number of mails copied from their original mailbox to        │
│            │          │ another.                                                     │
├────────────┼──────────┼──────────────────────────────────────────────────────────────┤
│ Moved      │        0 │ Number of mails moved from their original mailbox to         │
│            │          │ another.                                                     │
├────────────┼──────────┼──────────────────────────────────────────────────────────────┤
│ Deleted    │     2166 │ Number of mails deleted from their mailbox in-place.         │
╘════════════╧══════════╧══════════════════════════════════════════════════════════════╛
╒════════════════════╤══════════╤════════════════════════════════════════════════════════════╕
│ Duplicate sets     │   Metric │ Description                                                │
╞════════════════════╪══════════╪════════════════════════════════════════════════════════════╡
│ Total              │     4814 │ Total number of duplicate sets.                            │
├────────────────────┼──────────┼────────────────────────────────────────────────────────────┤
│ Single             │        0 │ Total number of sets containing only a single mail with no │
│                    │          │ applicable strategy. They were automatically kept in the   │
│                    │          │ final selection.                                           │
├────────────────────┼──────────┼────────────────────────────────────────────────────────────┤
│ Skipped - Encoding │        0 │ Number of sets skipped from the selection process because  │
│                    │          │ they had encoding issues.                                  │
├────────────────────┼──────────┼────────────────────────────────────────────────────────────┤
│ Skipped - Size     │        6 │ Number of sets skipped from the selection process because  │
│                    │          │ they were too dissimilar in size.                          │
├────────────────────┼──────────┼────────────────────────────────────────────────────────────┤
│ Skipped - Content  │        2 │ Number of sets skipped from the selection process because  │
│                    │          │ they were too dissimilar in content.                       │
├────────────────────┼──────────┼────────────────────────────────────────────────────────────┤
│ Skipped - Strategy │        0 │ Number of sets skipped from the selection process because  │
│                    │          │ the strategy could not be applied.                         │
├────────────────────┼──────────┼────────────────────────────────────────────────────────────┤
│ Deduplicated       │     1194 │ Number of valid sets on which the selection strategy was   │
│                    │          │ successfully applied.                                      │
╘════════════════════╧══════════╧════════════════════════════════════════════════════════════╛

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/jake/.cache/pypoetry/virtualenvs/mail-deduplicate-H7EXcUqJ-py3.9/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/jake/.cache/pypoetry/virtualenvs/mail-deduplicate-H7EXcUqJ-py3.9/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/jake/.cache/pypoetry/virtualenvs/mail-deduplicate-H7EXcUqJ-py3.9/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/jake/.cache/pypoetry/virtualenvs/mail-deduplicate-H7EXcUqJ-py3.9/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/jake/.cache/pypoetry/virtualenvs/mail-deduplicate-H7EXcUqJ-py3.9/lib/python3.9/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/jake/src/mail-deduplicate/mail_deduplicate/cli.py", line 403, in mdedup
    dedup.check_stats()
  File "/home/jake/src/mail-deduplicate/mail_deduplicate/deduplicate.py", line 463, in check_stats
    assert self.stats["mail_retained"] == self.stats["mail_duplicates"]
AssertionError

@kdeldycke kdeldycke mentioned this pull request Sep 12, 2021
8 tasks
@kdeldycke
Copy link
Owner

This PR has been superseded by a combination of #286 and #290. The problem this PR try to solve should be definitively fixed by the code upstream. A new release of mail-deduplicate is going to be pushed soon to PyPi.

@kdeldycke kdeldycke closed this Sep 12, 2021
@kdeldycke kdeldycke added bug and removed bug labels Sep 12, 2021
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 13, 2021
@kdeldycke kdeldycke added 🧑‍🤝‍🧑 duplicate This issue or pull request already exists and removed duplicate labels Nov 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🧑‍🤝‍🧑 duplicate This issue or pull request already exists
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to ignore single messages when performing any actions
3 participants