Bill Bumgarner’s useful Dupinator script, for removing duplicate files, recently hit Python-URL. However, it has a logic bug that end up deleting too many files.
If you have several sets of duplicates that happen to share the same file size, all but one of the sets will be wiped out completely. The problem is that within each group of files of identical size, there’s at most a single generated “duplicates” list. The first file on the list is spared; the rest are deleted.
The net effect, when I tested the script on a large corpus of text files, was the program reported it would delete many files that were clearly not identical. (I had commented out the os.remove call for testing.)
There was an additional problem with iPhoto: the posted script follows symbolic links. iPhoto stores its albums as collections of symbolic links, so all photos in albums are flagged as duplicates of the original photos. An islink() test fixes this.
Here’s a modified version of the script. It has only been lightly tested, though the changes did successfully eliminate the false positives. Uncomment the os.remove() line only when you are satisfied with the list of redundant files generated.
Minor optimizations: all files < = 1024 bytes go directly into the dupes list, not potentialDupes, since the whole file has already been checked. Also, Mac OS X’s pesky .DS_Store files are skipped.
(I haven’t heard back from Bill yet on incorporating the fixes into his code, so I’m posting here.)
View Source Code (dupinator.py)
January 14th, 2005 at 12:59 PM
I try to give any script like this a
-nand/or a--simulateoption, instead of relying on commenting.June 28th, 2007 at 7:47 PM
I am not much of a programmmer, I am just looking for a simple way to remove duplicate files from my Mac. I tried both your script, and Bill Bumgarner’s but neither one of them actually removes the files. Is there something that I am missing…and option that I have to pass to the command to actually make it remove the files? Please help.
Paul
March 12th, 2008 at 9:15 PM
Paul, it seems to require a folder to check. If you are already in the folder you want to check then run it like this:
dupinator.py .
That is dupinator.py (space) (dot)