-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subfolders are not processed #123
Comments
Oh yes you're right. That is a legitimate issue. Users expects all mails in all subfolders to be part of the initial deduplication pool by default. |
I hacked something together in b82accc . It is available in the brand new 6.0.2 release. Can you try it out and shares the results here please? |
Seeing the changes you made, I was a bit expecting the following result
So for this bug report, I take it fixed and closed! What I didn't expect
On my system (AL212 1.4GHz, 1GB RAM) it processed about 1/3 of all emails before it was killed (probably for using all memory). Do not know how big mailboxes others deduping, but ours are huge (+100GB, +10Mio emails) and after f*cked up migration, we have a plethora of duplicates - so some incremental disk offloading may help (if my assumption is right). As a developer myself, I can offer some help, but it would take some time to jump in (I am more like a C++/C#/Typescript guy), and I don't currently have any. The only help I can offer for now is testing and "light debugging". May I open another bug report for that or you would not bother with it? |
Ah yeah, I'm not surprised at all. I know there's lots of low hanging fruit lying around (like mail's double copies). And there's also: #87. So no need to create new tickets. I'm past the personal needs for that tool to be honest. I don't need it anymore. These last few weeks are probably the last efforts I invested to make the tool and the project in good shape (stable feature, good enough unittests). Now we both need a strong contributor to step in if we need more big stuff. |
That being said, and now that I think about it, you can hire me to implement better performance! 👨💻 |
My boss doesn't like the idea of spending money on such things (bloody him) and that is exactly the reason why I am here and not running some paid tool already (with all the respect to work you have done). I even lifted my *ss of the chair to ask over again, but unlucky for both of us - he didn't change his mind :/. Anyway, thank you, and good luck with whatever else you are doing right now! |
@tichaczech I'd suggest some massive swap partition while you are weeding out those dupes. That worked/works for me. As a datapoint, I'm using about 3GB memory for 50.000 mails. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Only 69 emails are processed, although I have +30k in mine .Maildir. Those 69 are just in my INBOX folder, the rest is in subfolders which leads to my assumption that subfolders are not processed.
Deduplication command on .Maildir
File count in .Maildir
All data on execution context as provided by
$ mdedup --version
:The text was updated successfully, but these errors were encountered: