Edit
After a long process of roaming the web, re-runs and troubleshoot the script with this wonderful community, the script is functional and does what it’s intended to do. The script itself is probably even further improvable in terms of efficiency/logic, but I lack the necessary skills/knowledge to do so, feel free to copy, edit or even propose a more efficient way of doing the same thing.
I’m greatly thankful to @[email protected], @[email protected], @[email protected] and Phil Harvey (exiftool) for their help, time and all the great idea’s (and spoon-feeding me with simple and comprehensive examples ! )
How to use
Prerequisites:
parallel
package installed on your distribution
Copy/past the below script in a file and make it executable. Change the start_range/end_range
to your needs and install the parallel
package depending on your OS and run the following command:
time find /path/to/your/image/directory/ -type f | parallel ./script-name.sh
This will order only the pictures from your specified time range into the following structure YEAR/MONTH
in your current directory from 5 different time tag/timestamps (DateTimeOriginal, CreateDate, FileModifyDate, ModifyDate, DateAcquired).
You may want to swap ModifyDate
and FileModifyDate
in the script, because ModifyDate
is more accurate in a sense that FileModifyDate
is easily changeable (as soon as you make some modification to the pictures, this will change to your current date). I needed that order for my specific use case.
From:
'-directory<$DateAcquired/' '-directory<$ModifyDate/' '-directory<$FileModifyDate/' '-directory<$CreateDate/' '-directory<$DateTimeOriginal/'
To:
'-directory<$DateAcquired/' '-directory<$FileModifyDate/' '-directory<$ModifyDate/' '-directory<$CreateDate/' '-directory<$DateTimeOriginal/'
As per exfitool’s documentation:
ExifTool evaluates the command-line arguments left to right, and latter assignments to the same tag override earlier ones.
#!/bin/bash
if [ $# -eq 0 ]; then
echo "Usage: $0 <filename>"
exit 1
fi
# Concatenate all arguments into one string for the filename, so calling "./script.sh /path/with spaces.jpg" should work without quoting
filename="$*"
start_range=20170101
end_range=20201230
FIRST_DATE=$(exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$filename" | tr -d '-' | awk '{print $1}')
if [[ "$FIRST_DATE" != '' ]] && [[ "$FIRST_DATE" -gt $start_range ]] && [[ "$FIRST_DATE" -lt $end_range ]]; then
exiftool -api QuickTimeUTC -d %Y/%B '-directory<$DateAcquired/' '-directory<$ModifyDate/' '-directory<$FileModifyDate/' '-directory<$CreateDate/' '-directory<$DateTimeOriginal/' '-FileName=%f%-c.%e' "$filename"
else
echo "Not in the specified time range"
fi
Hi everyone !
Please no bash-shaming
, I did my outmost best to somehow put everything together and make it somehow work without any prior bash programming knowledge. It took me a lot of effort and time.
While I’m pretty happy with the result, I find the execution time very slow: 16min for 2288 files
.
On a big folder with approximately 50,062 files, this would take over 6 hours !!!
If someone could have a look and give me some easy to understand hints, I would greatly appreciate it.
What Am I trying to achieve ?
Create a bash script that use exiftool
to stripe the date from images in a readable format (20240101) and compare it with an end_range
to order only images from that specific date range
(ex: 2020-01-01 -> 2020-12-30).
Also, some images lost some EXIF data, so I have to loop through specific time fields:
- DateTimeOriginal
- CreateDate
- FileModifyDate
- DateAcquired
The script in question
#!/bin/bash
shopt -s globstar
folder_name=/home/user/Pictures
start_range=20170101
end_range=20180130
for filename in $folder_name/**/*; do
if [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateTimeOriginal "$filename") =~ ^[0-9]+$ ]]; then
DateTimeOriginal=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -DateTimeOriginal "$filename")
if [ "$DateTimeOriginal" -gt $start_range ] && [ "$DateTimeOriginal" -lt $end_range ]; then
/usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$DateTimeOriginal/' '-FileName=%f%-c.%e' "$filename"
echo "Found a value"
echo "Okay its $(tput setab 22)DateTimeOriginal$(tput sgr0)"
fi
elif [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -CreateDate "$filename") =~ ^[0-9]+$ ]]; then
CreateDate=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -CreateDate "$filename")
if [ "$CreateDate" -gt $start_range ] && [ "$CreateDate" -lt $end_range ]; then
/usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$CreateDate/' '-FileName=%f%-c.%e' "$filename"
echo "Found a value"
echo "Okay its $(tput setab 27)CreateDate$(tput sgr0)"
fi
elif [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -FileModifyDate "$filename") =~ ^[0-9]+$ ]]; then
FileModifyDate=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -FileModifyDate "$filename")
if [ "$FileModifyDate" -gt $start_range ] && [ "$FileModifyDate" -lt $end_range ]; then
/usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$FileModifyDate/' '-FileName=%f%-c.%e' "$filename"
echo "Found a value"
echo "Okay its $(tput setab 202)FileModifyDate$(tput sgr0)"
fi
elif [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateAcquired "$filename") =~ ^[0-9]+$ ]]; then
DateAcquired=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -DateAcquired "$filename")
if [ "$DateAcquired" -gt $start_range ] && [ "$DateAcquired" -lt $end_range ]; then
/usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$DateAcquired/' '-FileName=%f%-c.%e' "$filename"
echo "Found a value"
echo "Okay its $(tput setab 172)DateAcquired(tput sgr0)"
fi
elif [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -ModifyDate "$filename") =~ ^[0-9]+$ ]]; then
ModifyDate=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -ModifyDate "$filename")
if [ "$ModifyDate" -gt $start_range ] && [ "$ModifyDate" -lt $end_range ]; then
/usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$ModifyDate/' '-FileName=%f%-c.%e' "$filename"
echo "Found a value"
echo "Okay its $(tput setab 135)ModifyDate(tput sgr0)"
fi
else
echo "No EXIF field found"
done
Things I have tried
- Reducing the number of
if
calls
But it didn’t much improve the execution time (maybe a few ms?). The syntax looks way less readable but what I did, was to add a lot of or ( || ) in the syntax to reduce to a single if
call. It’s not finished, I just gave it a test drive with 2 EXIF fields (DateTimeOriginal and CreateDate) to see if it could somehow improve time. But meeeh :/.
#!/bin/bash
shopt -s globstar
folder_name=/home/user/Pictures
start_range=20170101
end_range=20201230
for filename in $folder_name/**/*; do
if [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateTimeOriginal "$filename") =~ ^[0-9]+$ ]] || [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -CreateDate "$filename") =~ ^[0-9]+$ ]]; then
DateTimeOriginal=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -DateTimeOriginal "$filename")
CreateDate=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -CreateDate "$filename")
if [ "$DateTimeOriginal" -gt $start_range ] && [ "$DateTimeOriginal" -lt $end_range ] || [ "$CreateDate" -gt $start_range ] && [ "$CreateDate" -lt $end_range ]; then
/usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$DateTimeOriginal/' '-directory<$CreateDate/' '-FileName=%f%-c.%e' "$filename"
echo "Found a value"
echo "Okay its $(tput setab 22)DateTimeOriginal$(tput sgr0)"
else
echo "FINISH YOUR SYNTAX !!"
fi
fi
done
- Playing around with find
To recursively find my image files in all my folders I first tried the find
function, but that gave me a lot of headaches… When my image file name had some spaces in it, it just broke the image path strangely… And all answers I found on the web were gibberish, and I couldn’t make it work in my script properly… Lost over 4 yours only on that specific issue !
To overcome the hurdle someone suggest to use shopt -s globstar
with for filename in $folder_name/**/*
and this works perfectly. But I have no idea If this could be the culprit of slow execution time?
- Changing all
[ ]
into[[ ]]
That also didn’t do the trick.
How to Improve the processing time ?
I have no Idea if it’s related to my script or the exiftool call that makes the script so slow. This isn’t that much of a complicated script, I mean, it’s a comparison between 2 integers not a hashing of complex numbers.
I hope someone could guide me in the right direction :)
Thanks !
Yeah, I think the fact that you need to capture the output and then use that as input to another exiftool command complicates things a lot; if you just need to run an exiftool command on each photo and not worry about the output I think the
-stay_open
approach would work, but I honestly have no idea how you would juggle the input and output in your case.Regardless, I’m glad you were able to see some improvement! Honestly, I’m the wrong person to ask about bash scripts, since I only use them for really basic stuff. There are wizards who do all kinds of crazy stuff with bash, which is incredibly useful if you’re trying to create a portable tool with no dependencies beyond any binaries it may call. But personally, if I’m just hacking myself together something good enough to solve a one-off problem for myself I’d rather reach for a more powerful tool like Python which demands less from my puny brain (forgive my sacrilege for saying this in a Bash community!). Here’s an example of how I might accomplish a similar task in Python using a wrapper around exiftool which allows me to batch process all the files in one go and gives me nice structured data (dictionaries, in this case) without having to do any text manipulation:
import exiftool import glob files = glob.glob(r"/path/to/photos/**/*", recursive=True) with exiftool.ExifToolHelper() as et: metadata = et.get_metadata(files) for d in metadata: for tag in ["EXIF:DateTimeOriginal", "EXIF:CreateDate", "File:FileCreateDate", "File:FileModifyDate", "EXIF:DateAcquired"]: if tag in d.keys(): # Per file logic goes here print(f'{d["File:FileName"]} {d[tag]}') break
This outline of a script (which grabs the metadata from all files recursively and prints the filename and first date tag found for each) ran in 4.2 s for 831 photos on my machine (so ~5 ms per photo).
Since I’m not great in bash and not well versed in exiftool’s options, I just want to check my understanding: for each photo, you want to check if it’s in the specified date range, and then if it is you want to copy/move it to a directory of the format YYYYMMDD? I didn’t actually handle that logic in the script above, but I showed where you would put any arbitrary operations on each file. If you’re interested, I’d be happy to fill in the blank if you can describe your goal in a bit more detail!
Edit:
I’m leaving my comment in case you’re willing to read it through xD ! Another user proposed a hardware solution with
parallel
package to run my script with all my CPU cores. Fantastic solution ! Right now I get a processing time of 1min22s for 2200 files, that’s a huge improvement. It’s not the scripts efficiency itself that got improve so if you’re inspired and still want to fill the blanks, feel free ! Otherwise It’s alright and thank you for your help 😘 !#!/bin/bash if [ $# -eq 0 ]; then echo "Usage: $0 <filename>" exit 1 fi # Concatenate all arguments into one string for the filename, so calling "./script.sh /path/with spaces.jpg" should work without quoting filename="$*" start_range=20170101 end_range=20201230 FIRST_DATE=$(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$filename" | tr -d '-' | awk '{print $1}') #echo $FIRST_DATE if [[ "$FIRST_DATE" != '' ]] && [[ "$FIRST_DATE" -gt $start_range ]] && [[ "$FIRST_DATE" -lt $end_range ]]; then /usr/bin/vendor_perl/exiftool -api QuickTimeUTC -d %Y/%B '-directory<$DateAcquired/' '-directory<$ModifyDate/' '-directory<$FileModifyDate/' '-directory<$CreateDate/' '-directory<$DateTimeOriginal/' '-FileName=%f%-c.%e' "$filename" else echo "Error" fi
time find /home/user/Pictures/ -type f | parallel ./exif-test.bash
Hey thank you for your answer :)
I also though maybe with python It could be faster and maybe easier? Python’s syntax is easier to understand and maybe easier to troubleshoot, but who am I to make any conclusion, I’m only putting things together while roaming the web, so I have no idea what I’m talking about xD !
Yes, because my mother has a VERY large groupe of photos on different hard drives and I need to search through a specific date (That’s the purpose of my start_range, end_range) and order them to a directory of the format YEAR/MONTH. She’s looking for something specific and only somehow remembers the date, the problem is… A huge part of her photo metadata/folder is FUCKED UP… because of how she edited the files, extracted them into other folders (lot of duplicates…) moved them and even played arround with the metadata, tags…
After a few attempts of searching through the hard drives I just though I should make a script that automates everything, I like learning new things but I’m slowly giving up because trying to make something out without the necessary skill set and knowlege is a real pain… I already did something similar with FFMPEG (but was a 2 line bash script) so I though It might be possible for this to.
As a bullet point what I’m trying to achieve is to:
I do have an actually working script that exactly does that (Thanks to a every person here trying to give me a hand), but it’s kinda slow. exiftool alone will process the same folder sample in ±2 mintues while my actual script takes 10 mins (quite an improvement over my first script I managed to put together and took 16min !) for 2200 files.
#!/bin/bash start_range=20160101 end_range=20221212 find /home/dany/Pictures -type f -print0 | while IFS= read -r -d '' file_path; do image_path=$file_path FIRST_DATE=$(exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$image_path" | tr -d '-' | awk '{print $1}') if [[ "$FIRST_DATE" != '' ]] && [[ "$FIRST_DATE" -gt $start_range ]] && [[ "$FIRST_DATE" -lt $end_range ]]; then exiftool -api QuickTimeUTC -d %Y/%B '-directory<dateacquired' '-directory<modifydate' '-directory<filemodifydate' '-directory<createdate' '-directory<datetimeoriginal' '-FileName=%f%-c.%e' "$image_path" else echo "Error" fi done
If you feel inspired a want to fill in the blanks, feel free :)
Thanks for your contribution !
Alright, here’s what I’ve got!
#!/usr/bin/env python3 import datetime import glob import os import re import shutil import exiftool files = glob.glob(r"/path/to/photos/**/*", recursive=True) # Necessary to avoid duplicate files; if all photos have the same extension # you could simply add that extension to the end of the glob path instead files = [f for f in files if os.path.isfile(f)] parent_dir = r'/path/to/sorted/photos' start_date = datetime.datetime(2015, 1, 1) end_date = datetime.datetime(2024, 12, 31) date_extractor = re.compile(r'^(\d{4}):(\d{2}):(\d{2})') with exiftool.ExifToolHelper() as et: metadata = et.get_metadata(files) for d in metadata: for tag in ["EXIF:DateTimeOriginal", "EXIF:CreateDate", "File:FileModifyDate", "EXIF:ModifyDate", "XMP:DateAcquired"]: if tag in d.keys(): # Per file logic goes here year, month, day = [int(i) for i in date_extractor.match(d[tag]).group(1, 2, 3)] filedate = datetime.datetime(year, month, day) if filedate < start_date or filedate > end_date: break # Can uncomment below line for debugging purposes # print(f'{d['File:FileName']} {d[tag]} {year}/{month}') subdirectory = f'{parent_dir}/{year}/{month}' if not os.path.exists(subdirectory): os.makedirs(subdirectory) shutil.move(d['SourceFile'], subdirectory) break
Other than PyExifTool which will need to be installed using
pip
, all libraries used are part of the standard library. The basic flow of the script is to first grab metadata for all files using oneexiftool
command, then for each file to check for the existence of the desired tags in succession. If a tag is found and it’s within the specified date range, it creates the YYYY/MM subdirectory if necessary, moves the file, and then proceeds to process the next file.In my preliminary testing, this seemed to work great! The filtering by date worked as expected, and when I ran it on my whole test set (831 files) it took ~6 seconds of wall time. My gut feeling is that once you’ve implemented the main optimization of handling everything with a single execution of
exiftool
, this script (regardless of programming language) is going to be heavily I/O bound because the logic itself is simple and the bulk of time is spent reading and moving files, meaning your drive’s speed will be the key limiting factor. Out of those 6 seconds, only half a second was actual CPU time. And it’s worth keeping in mind that I’m doing this on a speedy NVME SSD (6 GB/s sequential read/write, ~300K IOPS random read/write), so it’ll be slower on a traditional HDD.There might be some unnecessary complexity for some people’s taste (e.g. using the
datetime
type instead of simple comparisons like in your bash script), but for something like this I’d prefer it to be brittle and break if there’s unexpected behavior because I parsed something wrong or put in nonsensical inputs rather than fail silently in a way I might not even notice.One important caveat is that none of my photos had that
XMP:DateAcquired
tag, so I can’t be certain that that particular tag will work and I’m not entirely sure that will be the tag name on your photos. You may want to run this tiny script just to check the name and format of the tag to ensure that it’ll work with my script:#!/usr/bin/env python3 import exiftool import glob import os files = glob.glob(r"/path/to/photos/**/*", recursive=True) # Necessary to avoid duplicate files; if all photos have the same extension # you could simply add that extension to the end of the glob path instead files = [f for f in files if os.path.isfile(f)] with exiftool.ExifToolHelper() as et: metadata = et.get_metadata(files) for d in metadata: if "XMP:DateAcquired" in d.keys(): print(f'{d['File:FileName']} {d[tag]}')
If you run this on a subset of your data which contains XMP-tagged files and it correctly spits out a list of files plus the date metadata which begins
YYYY:MM:DD
, you’re in the clear. If nothing shows up or the date format is different, I’d need to modify the script to account for that. In the former case, if you know of a specific file that does have the tag, it’d be helpful to get the exact tag name you see in the output from this script (I don’t need the whole output, just the name of the DateAcquired key):#!/usr/bin/env python3 import exiftool import json with exiftool.ExifToolHelper() as et: metadata = et.get_metadata([r'path/to/dateacquired/file']) for d in metadata: print(json.dumps(d, indent=4))
If you do end up using this, I’ll be curious to know how it compares to the
parallel
solution! If theexiftool
startup time ends up being negligible on your machine I’d expect it to be similar (since they’re both ultimately I/O bound, andparallel
saves time by being able to have some threads executing while others are waiting for I/O), but if theexiftool
spin-up time constitutes a significant portion of the execution time you may find it to be faster! If you don’t end up using it, no worries–it was a fun little exercise and I learned about a library that will definitely save me some time in the future if I need to do some EXIF batch processing!Hello again :)
Sorry for pinging you ! But I somehow figured it out!
with exiftool.ExifToolHelper(common_args=None) as et: metadata = et.get_metadata(files) for d in metadata: for tag in ["DateTimeOriginal", "CreateDate", "FileModifyDate", "ModifyDate", "DateAcquired"]:
This seems to work as expect with only the tag names and without the need of the tag groups (
[IFD0], [ExifIFD], [QuickTime]
). The processing time is impressive !! Thank you !!However if I may ask only one last thing… Is there anyway to change the file name when the file is moved and already exists to something like
/Pictures/sorted/2018/2/IMG_0993-1.JPG /Pictures/sorted/2018/2/IMG_0993-2.JPG /Pictures/sorted/2018/2/IMG_0993-3.JPG ....
I tried to wrap my head arround
os.move
oros.rename
but can’t make any sense out of it !Thank you in advance !!
Wow, nice find! I was going to handle it by just arbitrarily picking the first tag which ended with
CreateDate
,FileModifyDate
, etc., but this is a much better solution which relies on the native behavior of exiftool. I feel kind of silly for not looking at the documentation more carefully: I couldn’t find anything immediately useful when looking at the documentation for the class used in the script (ExifToolHelper
) but with the benefit of hindsight I now see this crucial detail about its parameters:And sure enough, that’s where the
common_args
parameter is detailed which handles this exact use case:As for the renaming, you could handle this by using
os.path.exists
as with the directory creation and using a bit of logic (along with the utility functionsos.path.basename
andos.path.splitext
) to generate a unique name before the move operation:# Ensure uniqueness of path basename = os.path.basename(d['SourceFile']) filename, ext = os.path.splitext(basename) count = 1 while os.path.exists(f'{subdirectory}/{basename}'): basename = f'{filename}-{count}{ext}' count += 1 shutil.move(d['SourceFile'], f'{subdirectory}/{basename}')
Hey ha :) !!
Yeah I know that feeling, I posted and add unnecessary noise to Phil Harvey’s forum about something I though was a “bug” or odd behavior with EXIF-tool, while it’s was just my lacking reading skills… I felt so dumb :/. Because I’m unable to build it up form the ground myself,like you did (great work, thanks again !!), I can only fiddle around and do my best reading the documentation to somehow find my way out. I was pretty happy and had a little surge of dopamine level :D !
THAT did the trick ! Thank you. I somehow “wrote” something similar but don’t look at it, it’s nonfunctional and ugly XD but I gave it a try while roaming the web.
try: shutil.move(d['SourceFile'], subdirectory) except: i = 0 while os.path.exists(d['SourceFile']): i += 1 base_name, extension = os.path.splitext(d['SourceFile']) new_filename = f"{base_name}-{i}{extension}" print (new_filename) os.rename(d['SourceFile'], new_filename) shutil.move(new_filename, subdirectory)
Final words
First, your script is a bomb ! Blazing fast and does everything I wanted ! And you were right with your first impression and the
-stay_open
switch. That’s what PyExifTool uses under the hood (read is somewhere in the docs)! I gave it try to implement that switch with anarg file
in my old/ugly/painful bash scirpt, but didn’t worked as expected. I will give it another try sometimes in the near future. Right now I’m exhausted from reading and all the re-runs to troubleshoot and test things and more than happy with your script (thanks again for everything !!!).Second, I hope you won’t be mad, but after a thorough re-reading of the exif-tool documentation and playing around a bit, I even managed to get exif-tool do the same thing, it looks something like this:
exiftool -P -d %Y:%m:%d -if '$DateTimeOriginal gt "2018:01:01" and $DateTimeOriginal lt "2021:01:01"' -api QuickTimeUTC -r '-directory<${DateTimeOriginal#;DateFmt("%Y/%B")}/' '-FileName=%f%-c.%e' .
In plain English this translates to:
This was the first command I was working on before starting to try a bash script, but It somehow messed up the folder creation, long story short: It was because of how my command formatted the date in the condition (-d %Y/%B):
2018/June gt "2018:01:01"
(yeah this will cause some strange behavior xD). However, your script is faster !!! For the same batch:2200 files --- Exif-Tool: 24s PyExifTool: 11s
Compared to my painful and ugly 11 minutes script… uuuhg !
Again, thank you very much for sharing your knowledge, your help/time and staying with me. 👍 😁 I hope we will meet again and maybe/hopefully have a proper conversation on programming/scripting !
Thanks 🙏.