Duplicate Annihilator

May 31st, 2010 § 0

If you operate anything like me, then it’s likely that you’ve got a massive amount of duplicate images tucked away in your photo library.  Through benign actions, technical ineptitude, or simple hoarding, it’s actually quite easy to amass a huge amount of unnecessary files in your image library.  While Aperture is superb at managing a library, it somehow lacks any internal method to identify and root out these doppelganger images.

My desire to nuke these duplicate images lead me to Duplicate Annihilator from Bratoo Software.  Finding no software reviews on the interwebs, I decided to take a plunge in and see if it could help my workflow.

Malignant Imaging

Lets establish a baseline here…at the start of this endeavor, my Aperture library contains 24,403 images in 357 projects, 213 of these images already in the trash.  My vault size is 275.9 GB, however the actual active library size is 128.55 GB.  By my own estimation, I’m assuming I’ve got somewhere around 10k duplicate images since I only had about 17k images in my library before cleaning up my hard disks, upgrading to Ap3 and consolidating all stray images into a single library.  Before we can start deleting supposed duplicates, we have to come to some understanding about how they get there in the first place.

First and most obvious would be that images are copied into your library multiple times.  Aperture doesn’t really prevent you from doing this, so it’s the easiest way to add the same image into your library as a duplicate.

If you’ve transitioned from an iPhoto library to Aperture, it’s highly likely that you’ve got duplicates sitting throughout your disks anyway.  The Aperture import function does not delete any photos from you iPhoto library even if you force it to copy the files instead of referencing them.  This results in two copies of each image residing on your disk in both your photo libraries.  Additionally, depending on the version of iPhoto you started with, it is also likely that you’ve got multiple copies of any image that you’ve performed edits upon.

Perhaps less obvious is the retention of JPEG and RAW pairs.  For me, I could care less about retaining any JPEGs I’ve shot if I’ve also retained a RAW master file, but a few years of shooting in paired mode has bloated my library with extra JPEGs that aren’t needed.

Finally, there’s a setting in Aperture’s preferences under the “advanced” tab that will create a new copy of each image every time you perform an adjustment.  In my opinion, this is stupid to leave enabled as it will create unnecessary bloat in your library.  Since aperture is performing “soft” edits, there’s no reason to create separate instances of images just to see a contrast adjustment.  A right click can quickly create a new image from the original master, so by NOT forcing aperture to create copies, you will keep a cleaner library while retaining flexibility to revisit all your images.

Duplicate Annihilator – Aperture Edition

The product page for Duplicate Annihilator focuses mainly on the iPhoto version of the program, and while it certainly looks impressive and well equipped, several of the iPhoto functions are not available in the Aperture Edition of the utility.  It’s important to keep in mind that these two products function somewhat differently, as I kept looking for features touted on the website only to find that they’re not available in the aperture edition.  It’s also important to ensure that you’re downloading the correct version, as the iPhoto version is not compatible with the Aperture library.

The utility is self contained and must run while aperture is closed.  There are vague warnings against opening aperture while the program is searching the library, so it’s probably safe to assume that the utility has the power to blow up your aperture database if you don’t take heed.

The trial version is a quick download and allows you to scan up to 500 images.  For a person like me with almost 25k images, the utility doesn’t really flex it’s muscles, but you can get a good feel for the general speed of the utility.  With a price of only $7.95, I decided to just cut the trial run short and buy a full registered version of the program.

Magic Mode – not so magical

After registering Duplicate Annihilator, I attempted to run the program but I found myself perplexed at the results.  The utility located 10,224 duplicates after a four hour process, totaling 39.04 GB.  The Annihilator doesn’t actually annihilate anything at all, instead all the program does is add a “duplicate” text tag to the keywords of each image it believes to be a duplicate.  The utility will either mark these duplicates running forward or backward through your database in a crude attempt to determine which image might be the original.

When I launched Aperture, it only found 9,854 images tagged with the word “duplicate”.  I started to go through the process of deleting each image marked as a duplicate and noticed that the logic of the duplicate annihilator wasn’t exactly as strong as I’d hoped.  In several instances, the program marked a RAW version of a file as the duplicate of a JPEG version.  Poking around the library further, I noticed several images that were visually dissimilar but taken in a close timeframe were also marked as duplicates.

This bothered me seriously, as a person could easily delete thousands of images that are mistakenly marked as duplicates.  Hoping to get a better handle on how the utility was working, I wrote the program’s developer for clarifications.

The first thing he suggested was to enable the feature to assign the “original” keyword to images the Annihilator believes to be original copies, then by flagging all images marked original and creating a smart album to pull all images marked with either of the keywords “Original” and “Duplicate”, you can safely compare and eliminate duplicate images.  This seemed logical in practice, but when after another four-hour run of the utility, I now found myself looking through a smart album with almost 20k images.  Since aperture can sort images in almost every possible way…with the EXCEPTION of visually sorting them (that’d be a neat trick huh?), there’s no way to place the duplicates along side the corresponding originals.

Additionally, I found that the utility was still marking images in sequence (with different, sequential filenames) as duplicates while also randomly marking either RAW or JPG images as origionals.  I asked the developer two questions in hopes of fine-tuning the search results, but the responses weren’t what I’d hoped.

How does the utility treat JPG and RAW versions of the same image?

“The Aperture masters database always point to one distinct image  file and that file will be the file that is compared no matter if it is a jpeg or a raw.”

There are two options to mark duplicates as walking forward or backwards through the library, is that more related to the positioning in the library structure or in the exif/file dating?

“In the case with Aperture that is actually backwards or forward through the aperture masters database where backwarrds results in keeping the most recently imported copy.”

From these responses, two core problems emerge with the function of the utility.  There isn’t any weighting function to enable a person to favor a RAW file over a JPEG and the process of identifying which file is original and which is a duplicate hinges entirely on the importing process of aperture.  If aperture imports the JPEG copy of an image before the RAW copy, the JPEG will be viewed as the original regardless of any other similarities.

Because of these core flaws, Duplicate Annihilator is essentially useless and I can’t recommend the Aperture version to anyone in it’s current state.  I would submit that there could certainly be some value in the iPhoto version of the utility, but since I don’t use iPhoto I can’t make a comparison.  Just from the product website alone, it seems like the iPhoto version is more advanced then the Aperture edition, so I’d still encourage people to check that out who are working in iPhoto.  At $7.95, it will only cost you two fancy starbucks coffees to see how the program works.

Although the Duplicate Annihilator couldn’t do anything for me at all, I did manage to eliminate 9,882 duplicate images from my library.  After all the poking and prodding with Duplicate Annihilator, I found myself sorting my entire library by file name and then I went through all 24k images one by one to delete the JPG half of RAW+JPG pairs, the duplicate edits and accidental extra imports.

My library is now sitting at a comfortable 14,792 images, for a size of 90.99 GB, and all I had to do was spend four days hunting through and comparing filenames.

Every second in front of a computer is a second I’m not shooting a picture of something…

Tagged: , , ,

§ Leave a Reply

What's this?

You are currently reading Duplicate Annihilator at I Shot a Lot.

meta