For years, I've been taking pictures of all kinds of things with my smartphone. The photo above, which is wall in Malta is just one example.

I have been storing all these pictures in a folder without any kind of sorting. It had gotten big enough that I had no idea what this folder contained.

It bothered me. I like my stuff well organized. So I sorted them and thought I'd share the process and scripts I used. Maybe this can help you too.

Deleting duplicate images

I had a lot of duplicate files, so deleting duplicates was a good first step to make sorting easier in the next steps. This is easily done on Linux with the fdupes command, as follows:

fdupes -rdN Pictures/

This will remove images that are perfect duplicates. If you want to go one step further and remove similar images (for instance, the same image but in two different sizes), you may want to use something such as Geeqie. This tutorial explains how you can use Geekie to quickly find similar images. It's helped me remove a few dozen images that were resized down but had the same content.

Sorting images

For me, sorting images by year is sufficient. I don't need to know where the images were taken or what they were about. If it's important enough, I'll remember anyway.

So I tried to sort them by year. The images' Exif data is helpful for that.

I created a script that sorts images by year. Given a directory of images (only images, no folders), it creates a directory that is named after the year at which each photo was taken and then moves the photo inside the created directory.

For instance, if you have one image from 2005 and one from 2018, it will create this directory structure:

2005
 -> one-image-from-2005.jpg
2018
 -> another-image-from-2018.jpg

Alright, let's get to the code:

#!/bin/bash

set -e

find Pictures/ -maxdepth 1 -type f -print0 | while IFS= read -r -d '' file
do
    # Bypasses any non-image files
    file "$file" | grep image > /dev/null
    if [ "0" != "$?" ]; then
        continue
    fi
    
    # Bypasses files that don't have Exif data
    exif="$(identify -verbose "$file" | grep "exif:")"
    if [ 0 != "$?" ]; then
        continue
    fi

    exif_year="$(echo "$exif" | grep exif:DateTimeOriginal | cut -d' ' -f6 | cut -d':' -f1)"

    if [ -n "$exif_year" ]; then
        echo "$file" '->' "$exif_year"
        test -d "$exif_year" || mkdir "$exif_year"
        mv "$file" "$exif_year"/
    fi
done

The -maxdepth 1 is important here, it prevents recursivity errors.

A few photos did not have any Exif data, so I sorted these by hand.

That's was it. Photos are properly sorted!

Creating value from useless images

After sorting those images, I realized a lot of them had no value for me personally. The quality is not professional and do not represent anything personal (it's pictures from public places, airports, etc).

I briefly thought about deleting them. Then I thought:

What if I put my "useless" images on a stock photo website?

Those images may not be useful to me right now, but they may well be useful to someone else. Or maybe they'll be useful to me in a few years and having a personal bank of images I can pick from would be cool.

So I had a look at how I could remove private information from these images. They can contain Exif data, which sometimes contains private information. To avoid leaking it, I'm stripping all of it:

find Pictures/ -type f -exec exiv2 rm {} \;

Now, I still want the photos to describe me as the photographer and I want to include copyright data in those images. So I'm adding back some Exif tags:

exiv2 -M"set Exif.Image.Copyright Copyright Conrad Kleinespel" *

I also want to strip any potentially private information from the image names themselves and the extensions of the images are somewhat odd — there was jpg, JPG, jpeg, JPEG which I'd like to replace with jpg.

So I created this script to normalize image names:

#!/bin/bash

set -e

find Pictures/ -type f -print0 | while IFS= read -r -d '' file
do
    # Bypasses any non-image files
    file "$file" | grep image > /dev/null
    if [ "0" != "$?" ]; then
        continue
    fi

    sha1="`sha1sum "$file" | cut -d' ' -f1`"
    ext="`echo "$file" | grep -o '.[[:alnum:]]*$' | tr '[:upper:]' '[:lower:]'`"

    if [ "$ext" = ".jpeg" ]; then
        ext=".jpg"
    fi

    new_name="$sha1$ext"
    mv "$file" "$new_name"
done

This creates image names like 3914678325556c51762c3c709c322d4357b2163d.jpg, without any personally identifiable information and with the image size.

I'm still trying to figure out the best way to put these images online. Ideally, I need something with automated image tagging or something that allows a seamless search experience without any manual work.