How do I add alternating strings to filenames and renumber them pairwise?
Using a high-throughput microscope, we produce thousands of images. Let's say our system names them:
ome0001.tif
ome0002.tif
ome0003.tif
ome0004.tif
ome0005.tif
ome0006.tif
ome0007.tif
ome0008.tif
ome0009.tif
ome0010.tif
ome0011.tif
ome0012.tif
...
We would like to alternatively insert c1
and c2
with respect to the numerical value of the images, and then change the original numbering so that each successive c1
and c2
harbor the same incremental number, respecting numerical order (1, then 2... then 9, then 10) rather than alphanumeric order (1, then 10, then 2...).
In my example, that would give:
ome0001c1.tif
ome0001c2.tif
ome0002c1.tif
ome0002c2.tif
ome0003c1.tif
ome0003c2.tif
ome0004c1.tif
ome0004c2.tif
ome0005c1.tif
ome0005c2.tif
ome0006c1.tif
ome0006c2.tif
...
We have not been able to do that via terminal command-line (biologist speaking...).
Any suggestion would be greatly appreciated!
Solution 1:
rename
performs bulk renaming, and it can do the arithmetic you need.
Different GNU/Linux distributions have different commands called rename
, with different syntax and capabilities. In Debian, Ubuntu, and some other OSes, rename
is the Perl renaming utility prename
. It is quite well suited to this task.
First I recommend telling rename
to just show you what it would do, by running it with the -n
flag:
rename -n 's/\d+/sprintf("%04dc%d", int(($& - 1) \/ 2) + 1, 2 - $& % 2)/e' ome????.tif
That should show you:
rename(ome0001.tif, ome0001c1.tif)
rename(ome0002.tif, ome0001c2.tif)
rename(ome0003.tif, ome0002c1.tif)
rename(ome0004.tif, ome0002c2.tif)
rename(ome0005.tif, ome0003c1.tif)
rename(ome0006.tif, ome0003c2.tif)
rename(ome0007.tif, ome0004c1.tif)
rename(ome0008.tif, ome0004c2.tif)
rename(ome0009.tif, ome0005c1.tif)
rename(ome0010.tif, ome0005c2.tif)
rename(ome0011.tif, ome0006c1.tif)
rename(ome0012.tif, ome0006c2.tif)
Assuming that's what you want, go ahead and run it without the -n
flag (i.e., just remove -n
):
rename 's/\d+/sprintf("%04dc%d", int(($& - 1) \/ 2) + 1, 2 - $& % 2)/e' ome????.tif
That command is somewhat ugly--though still more elegant than using a loop in your shell--and perhaps someone with more Perl experience than I have will post a prettier solution.
I highly recommend Oli's tutorial Bulk renaming files in Ubuntu; the briefest of introductions to the rename command, for a gentle intro to writing rename
commands.
How that specific rename
command works:
Here's what s/\d+/sprintf("%04dc%d", int(($& - 1) \/ 2) + 1, 2 - $& % 2)/e
does:
- The leading
s
means to search for text to replace. - The regular expression
/\d+/
matches one or more (+
) digits (\d
). This matches your0001
,0002
, and so forth. - The command
sprintf("%04dc%d", int(($& - 1) / 2) + 1, 2 - $& % 2)
is built.$&
represents the match./
normally ends the replacement text, but\/
makes a literal/
(which is division, as detailed below). - The trailing
/e
means to evaluate the replacement text as code.
(Try running it with just/
instead of/e
at the end, but make sure to keep the-n
flag!)
Thus your new filenames are the return values of sprintf("%04dc%d", int(($& - 1) \/ 2) + 1, 2 - $& % 2)
. So what's going on there?
-
sprintf
returns formatted text. It first argument is the format string into which values are placed.%04d
consumes the first argument and formats it as an integer 4 characters wide.%4d
would omit leading zeros, hence%04d
is needed. Not being covered by any%
,c
means just a literal letterc
. Then%d
consumes the second argument and formats it as an integer (with default formatting). -
int(($& - 1) / 2) + 1
subtracts 1 from the number extracted from the original filename, divides it by 2, truncates the fractional portion (int
does that), then adds 1. That arithmetic sends0001
and0002
to0001
,0003
and0004
to0002
,0005
and0006
to0003
, and so forth. -
2 - $& % 2
takes the remainder of dividing the number extracted from the original filename by 2 (%
does that), which is 0 if it's even and 1 if it's odd. It then subtracts that from 2. This arithmetic sends0001
to1
,0002
to2
,0003
to1
,0004
to2
, and so forth.
Finally, ome????.tif
is a glob that your shell expands to a list of all the filenames in the current directory that start with ome
, end in .tif
, and have exactly four of any characters in between.
This list is passed to the rename
command, which will attempt to rename (or with -n
, tell you how it would rename) all of the files whose names contain a match to the pattern \d+
.
- From your description, it doesn't sound like you have any files in that directory named that way but with some of the characters not digits.
- But if you do then you can replace
\d+
with\d{4}
in the regular expression appearing in the commands shown above, to ensure they aren't renamed, or just inspect the output produced with-n
carefully, which you should be doing anyway. - I wrote
\d+
instead of\d{4}
to avoid making the command more complex than necessary. (There are many different ways to write it.)
Solution 2:
I used a way to do this in Bash based on the idea that if the number in the filename is even, we want to divide it by two, and add c2
, and if the number is odd, we want to add one to it and then divide by two, and add c1
. Treating odd and even numbered files separately like this is much lengthier than Eliah Kagan's Bash method and I agree that using rename
as in this other answer by Eliah Kagan is the smart way, but this kind of approach might be useful in some situations.
A slight advantage to this, over using a range like {0000...0012}
is that it only tries to operate on existing files, so it won't complain if the files don't exist. However, you still get illogically numbered files if there are any gaps. See the second part of my answer for a way that doesn't have this problem.
In one line it looks awful:
for f in *; do g="${f%.tif}"; h="${g#ome}"; if [[ $(bc <<< "$h%2") == 0 ]]; then printf -v new "ome%04dc2.tif" "$(bc <<< "$h/2")" ; echo mv -vn -- "$f" "$new"; else printf -v new "ome%04dc1.tif" "$(bc <<< "($h+1)/2")"; echo mv -vn -- "$f" "$new"; fi; done
Here's that as a script:
#!/bin/bash
for f in *; do
g="${f%.tif}"
h="${g#ome}"
if [[ $(bc <<< "$h%2") == 0 ]]; then
printf -v new "ome%04dc2.tif" "$(bc <<< "$h/2")"
echo mv -vn -- "$f" "$new"
else
printf -v new "ome%04dc1.tif" "$(bc <<< "($h+1)/2")"
echo mv -vn -- "$f" "$new"
fi
done
The echo
es prepending the mv
statements are just for testing. Remove them to actually rename the files if you're seeing what you want being done.
Notes
g="${f%.tif}" # strip off the extension
h="${g#ome}" # strip off the letters... now h contains the number
Test that the number is even (ie dividing by 2 gives no remainder)
if [[ $(bc <<< "$h%2") == 0 ]]; then
I've used bc
, which won't try to treat numbers with leading zeroes as octal numbers, although I could have just stripped off the zeroes with another string expansion since I'm going to format the numbers fixed-width anyway.
Next construct the new name for the even-numbered files:
printf -v new "ome%04dc2.tif" "$(bc <<< "$h/2")"
%04d
will be replaced by the number output by bc <<< "$h/2"
in 4 digit format, padded with leading zeroes (so 0 = 0000, 10 = 0010, etc).
Rename the original file with the constructed new name
echo mv -vn -- "$f" "$new"
-v
for verbose, -n
for no-clobber (don't overwrite files that already have the intended name, if they exist) and --
to prevent errors from filenames beginning with -
(but since the rest of my script expects your files to be named ome[somenumber].tif
I guess I'm just adding it out of habit).
Filling the gaps
After some tinkering and more help from Eliah Kagan, I worked out more succinct way to increment the names that has the advantage of filling the gaps. The problem with this way is that only increments a number, does some simple arithmetic on that number, formats it, and puts it in the filename. Bash thinks (so to speak) "ok, here's the next file, I'll give it the next name", without paying any attention to the original filename. This means it creates new names that don't relate to the old names, so you will not be able to logically undo the renaming, and the files will be renamed in the correct order only if their names are already such that they will be processed in the right order. This is the case in your example, which has fixed-width zero-padded numbers, but if you had files named, say, 2
, 8
, 10
, 45
they would be processed in the order 10
, 2
, 45
, 8
, which is probably not what you want.
If this approach is suitable for you given all that, you can do it like this:
i=0; for f in ome????.tif; do ((i++)); printf -v new "ome%04dc%d.tif" $(((i+1)/2)) $(((i+1)%2+1)); echo mv -vn "$f" "$new"; done
or
#!/bin/bash
i=0
for f in ome????.tif; do
((i++))
printf -v new "ome%04dc%d.tif" $(((i+1)/2)) $(((i+1)%2+1))
echo mv -vn "$f" "$new"
done
Notes
-
i=0
initiate a variable -
((i++))
increment the variable by one (this counts iterations of the loop) -
printf -v new
put the following statement into the variablenew
-
"ome%04dc%d.tif"
the new filename with the number formats that will be replaced with the subsequently mentioned numbers -
$(((i+1)/2))
the number of times the loop has been run plus one, divided by 2This works on the basis that Bash only does integer division, so when we divide an odd number by 2, we get the same answer as we got when we divided the preceding even number by 2:
$ echo $((2/2)) 1 $ echo $((3/2)) 1
-
$(((i+1)%2+1))
The remainder after dividing number of times the loop has been run plus one by two, plus one. This means, if the number of the iteration is odd (eg the first run), the output is1
, and if the number of the iteration is even (eg the second run), the output is2
, givingc1
orc2
-
I used
i=0
because then at any point during the run, the value ofi
will be the number of times the loop has been run, which might be useful for debugging as it will also be the ordinal number of the file being processed (i.e. wheni=69
, we are processing the 69th file). However, we can simplify the arithmetic by starting with a differenti
, for example:i=2; for f in ome????.tif; do printf -v new "ome%04dc%d.tif" $((i/2)) $((i%2+1)); echo mv -vn "$f" "$new"; ((i++)); done
There are lots of ways to do this :)
-
echo
just for testing - remove if you see the result you want.
Here's an example of what this method does:
$ ls
ome0002.tif ome0004.tif ome0007.tif ome0009.tif ome0010.tif ome0012.tif ome0019.tif ome0100.tif ome2996.tif
$ i=0; for f in ome????.tif; do ((i++)); printf -v new "ome%04dc%d.tif" $(((i+1)/2)) $(((i+1)%2+1)); echo mv -vn "$f" "$new"; done
mv -vn ome0002.tif ome0001c1.tif
mv -vn ome0004.tif ome0001c2.tif
mv -vn ome0007.tif ome0002c1.tif
mv -vn ome0009.tif ome0002c2.tif
mv -vn ome0010.tif ome0003c1.tif
mv -vn ome0012.tif ome0003c2.tif
mv -vn ome0019.tif ome0004c1.tif
mv -vn ome0100.tif ome0004c2.tif
mv -vn ome2996.tif ome0005c1.tif