-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option for ignore file content #160
base: master
Are you sure you want to change the base?
Conversation
3b980bd
to
bbf5537
Compare
Comparing files by size is not at all a reliable way of detecting
duplicates. Such a feature would fall outside of fdupes' intended purpose.
…On Sat, Sep 4, 2021, 7:01 AM Arseney Mesheryakov ***@***.***> wrote:
Hello.
In cases, when files in directory are very large, fdupes can work a very
long time.
But i think, be great, if fdupes will have special option for ignoring
content of files and compare only by sizes.
I add new "-c --ignore-content" option, new compare function, and make
small crunch in checkmatch for avoiding reading a whole file.
------------------------------
You can view, comment on, or merge this pull request online at:
#160
Commit Summary
- add -c option
- update --avoid-content
- change naming
File Changes
- *M* fdupes.c
<https://github.com/adrianlopezroche/fdupes/pull/160/files#diff-a279a3be8c0ffbf671c08a3d17376b936ea857bdc5742f1e01e9b0a143b93836>
(73)
- *M* flags.h
<https://github.com/adrianlopezroche/fdupes/pull/160/files#diff-8ea76a8a74222d114ae9560e7d8dcfda511a8f7c3da4c46d1be598b2ca0b3142>
(2)
Patch Links:
- https://github.com/adrianlopezroche/fdupes/pull/160.patch
- https://github.com/adrianlopezroche/fdupes/pull/160.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#160>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABPQT7JRNHZDHKQCHDUFRALUAH37JANCNFSM5DNFTGLQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
totally agree checksums are better. But I've written something for myself that does this and keeps the hashes saved to make it easier to compare with other folders on other file systems. Perhaps an option to save the checksum metadata between runs? |
I tried to use fdupes to find duplicate films on a 5T drive full of films with many duplicates and it was useless for that purpose unless you were in a very cold place and wanted some warmth from your CPU because fdupes just went on and on forever with 100% CPU Now with -c fdupes worked superbly well and took less than a minute to find all the many duplicates and as far as I could determine it made no mistakes so for this use-case fdupes -c is excellent. Prior to this I almost paid good money to buy some proprietary code: God for bid now I can spend it on beer instead. |
update --avoid-content change naming update manual page
Would it be possible to log all the md5 sums generated during a run as an option? |
I have been using fdupes with the -c option for some time now. I find it particularly useful for quickly finding duplicate names in my very large film and music collections. To the best of my knowledge, there is no other free option available which does the same thing as quickly as fdupes -c can do it. With the -c option I can search 5T of data in just a few seconds to find duplicate names which is perfect for finding duplicate film names etc where absolute content is not so important. There is a paid-for version that can do the same however it is quite expensive. I really do think the -c option ought to be included otherwise many other people who just wish to search their film/music collections for only duplicate names will have to pay for a paid-for version. Including option -c will not detract from any of the other options available in fdupes so it has everything to gain with nothing lost so unless we are just being purists here for no good reason I see no reason not to have the -c option. |
I agree - I would love to see this added, as my use case is exactly what was described. I don't care much about the actual content - I care about fast comparisons, and also do not see the harm to give the user this option if he or she may want it. |
You could basically use This will print a list of files with equal sizes. |
This is a fantastic way to lose a lot of data quickly. Don't do this unless you know your data quite intimately. |
How can you lose your data this way? |
I have been using this option for many months sorting my vast film and music collection and have lost nothing, it works really fast is easy to use and is as far as I am concerned reliable for what I use it for. If you want to be a die-hard purist then go ahead and try :- |
How can you lose data by assuming identical size equals identical contents and then taking potentially destructive actions based on that assumption? Are you seriously asking me this question? |
Okay are we even talking about the same thing? The command I posted is just listing file names with equal sizes, not more not less. No destructive actions, in fact, no actions at all. And as people mentioned above, there are use cases where you might want to have that list of files with equal file sizes. What you want to do with that information is a different story. |
This tool is used primarily to delete duplicate files. https://en.wikipedia.org/wiki/Principle_of_least_astonishment https://www.jjinux.com/2021/05/add-another-entry-to-unix-haters.html Also, my response was primarily against the idea in general, not your code in particular. |
Hello.
In cases, when files in directory are very large, fdupes can work a very long time.
But i think, be great, if fdupes will have special option for ignoring content of files and compare only by sizes.
I add new "-c --ignore-content" option, new compare function, and make small crunch in checkmatch for avoiding reading a whole file.