How do I add alternating strings to filenames and renumber them pairwise?

Using a high-throughput microscope, we produce thousands of images. Let's say our system names them:

ome0001.tif
ome0002.tif
ome0003.tif
ome0004.tif
ome0005.tif
ome0006.tif
ome0007.tif
ome0008.tif
ome0009.tif
ome0010.tif
ome0011.tif
ome0012.tif
...

We would like to alternatively insert c1 and c2 with respect to the numerical value of the images, and then change the original numbering so that each successive c1 and c2 harbor the same incremental number, respecting numerical order (1, then 2... then 9, then 10) rather than alphanumeric order (1, then 10, then 2...).

In my example, that would give:

ome0001c1.tif
ome0001c2.tif
ome0002c1.tif
ome0002c2.tif
ome0003c1.tif
ome0003c2.tif
ome0004c1.tif
ome0004c2.tif
ome0005c1.tif
ome0005c2.tif
ome0006c1.tif
ome0006c2.tif
...

We have not been able to do that via terminal command-line (biologist speaking...).

Any suggestion would be greatly appreciated!


Solution 1:

rename performs bulk renaming, and it can do the arithmetic you need.

Different GNU/Linux distributions have different commands called rename, with different syntax and capabilities. In Debian, Ubuntu, and some other OSes, rename is the Perl renaming utility prename. It is quite well suited to this task.

First I recommend telling rename to just show you what it would do, by running it with the -n flag:

rename -n 's/\d+/sprintf("%04dc%d", int(($& - 1) \/ 2) + 1, 2 - $& % 2)/e' ome????.tif

That should show you:

rename(ome0001.tif, ome0001c1.tif)
rename(ome0002.tif, ome0001c2.tif)
rename(ome0003.tif, ome0002c1.tif)
rename(ome0004.tif, ome0002c2.tif)
rename(ome0005.tif, ome0003c1.tif)
rename(ome0006.tif, ome0003c2.tif)
rename(ome0007.tif, ome0004c1.tif)
rename(ome0008.tif, ome0004c2.tif)
rename(ome0009.tif, ome0005c1.tif)
rename(ome0010.tif, ome0005c2.tif)
rename(ome0011.tif, ome0006c1.tif)
rename(ome0012.tif, ome0006c2.tif)

Assuming that's what you want, go ahead and run it without the -n flag (i.e., just remove -n):

rename 's/\d+/sprintf("%04dc%d", int(($& - 1) \/ 2) + 1, 2 - $& % 2)/e' ome????.tif

That command is somewhat ugly--though still more elegant than using a loop in your shell--and perhaps someone with more Perl experience than I have will post a prettier solution.

I highly recommend Oli's tutorial Bulk renaming files in Ubuntu; the briefest of introductions to the rename command, for a gentle intro to writing rename commands.


How that specific rename command works:

Here's what s/\d+/sprintf("%04dc%d", int(($& - 1) \/ 2) + 1, 2 - $& % 2)/e does:

  • The leading s means to search for text to replace.
  • The regular expression /\d+/ matches one or more (+) digits (\d). This matches your 0001, 0002, and so forth.
  • The command sprintf("%04dc%d", int(($& - 1) / 2) + 1, 2 - $& % 2) is built. $& represents the match. / normally ends the replacement text, but \/ makes a literal / (which is division, as detailed below).
  • The trailing /e means to evaluate the replacement text as code.
    (Try running it with just / instead of /e at the end, but make sure to keep the -n flag!)

Thus your new filenames are the return values of sprintf("%04dc%d", int(($& - 1) \/ 2) + 1, 2 - $& % 2). So what's going on there?

  • sprintf returns formatted text. It first argument is the format string into which values are placed. %04d consumes the first argument and formats it as an integer 4 characters wide. %4d would omit leading zeros, hence %04d is needed. Not being covered by any %, c means just a literal letter c. Then %d consumes the second argument and formats it as an integer (with default formatting).
  • int(($& - 1) / 2) + 1 subtracts 1 from the number extracted from the original filename, divides it by 2, truncates the fractional portion (int does that), then adds 1. That arithmetic sends 0001 and 0002 to 0001, 0003 and 0004 to 0002, 0005 and 0006 to 0003, and so forth.
  • 2 - $& % 2 takes the remainder of dividing the number extracted from the original filename by 2 (% does that), which is 0 if it's even and 1 if it's odd. It then subtracts that from 2. This arithmetic sends 0001 to 1, 0002 to 2, 0003 to 1, 0004 to 2, and so forth.

Finally, ome????.tif is a glob that your shell expands to a list of all the filenames in the current directory that start with ome, end in .tif, and have exactly four of any characters in between.

This list is passed to the rename command, which will attempt to rename (or with -n, tell you how it would rename) all of the files whose names contain a match to the pattern \d+.

  • From your description, it doesn't sound like you have any files in that directory named that way but with some of the characters not digits.
  • But if you do then you can replace \d+ with \d{4} in the regular expression appearing in the commands shown above, to ensure they aren't renamed, or just inspect the output produced with -n carefully, which you should be doing anyway.
  • I wrote \d+ instead of \d{4} to avoid making the command more complex than necessary. (There are many different ways to write it.)

Solution 2:

I used a way to do this in Bash based on the idea that if the number in the filename is even, we want to divide it by two, and add c2, and if the number is odd, we want to add one to it and then divide by two, and add c1. Treating odd and even numbered files separately like this is much lengthier than Eliah Kagan's Bash method and I agree that using rename as in this other answer by Eliah Kagan is the smart way, but this kind of approach might be useful in some situations.

A slight advantage to this, over using a range like {0000...0012} is that it only tries to operate on existing files, so it won't complain if the files don't exist. However, you still get illogically numbered files if there are any gaps. See the second part of my answer for a way that doesn't have this problem.

In one line it looks awful:

for f in *; do g="${f%.tif}"; h="${g#ome}"; if [[ $(bc <<< "$h%2") == 0 ]]; then printf -v new "ome%04dc2.tif" "$(bc <<< "$h/2")" ; echo mv -vn -- "$f" "$new"; else printf -v new "ome%04dc1.tif" "$(bc <<< "($h+1)/2")"; echo mv -vn -- "$f" "$new"; fi; done

Here's that as a script:

#!/bin/bash

for f in *; do 
    g="${f%.tif}"
    h="${g#ome}"

    if [[ $(bc <<< "$h%2") == 0 ]]; then 
         printf -v new "ome%04dc2.tif" "$(bc <<< "$h/2")"
         echo mv -vn -- "$f" "$new"
    else
         printf -v new "ome%04dc1.tif" "$(bc <<< "($h+1)/2")"
         echo mv -vn -- "$f" "$new"
    fi
done

The echoes prepending the mv statements are just for testing. Remove them to actually rename the files if you're seeing what you want being done.

Notes

g="${f%.tif}"     # strip off the extension
h="${g#ome}"      # strip off the letters... now h contains the number

Test that the number is even (ie dividing by 2 gives no remainder)

if [[ $(bc <<< "$h%2") == 0 ]]; then 

I've used bc, which won't try to treat numbers with leading zeroes as octal numbers, although I could have just stripped off the zeroes with another string expansion since I'm going to format the numbers fixed-width anyway.

Next construct the new name for the even-numbered files:

printf -v new "ome%04dc2.tif" "$(bc <<< "$h/2")"

%04d will be replaced by the number output by bc <<< "$h/2" in 4 digit format, padded with leading zeroes (so 0 = 0000, 10 = 0010, etc).

Rename the original file with the constructed new name

echo mv -vn -- "$f" "$new"

-v for verbose, -n for no-clobber (don't overwrite files that already have the intended name, if they exist) and -- to prevent errors from filenames beginning with - (but since the rest of my script expects your files to be named ome[somenumber].tif I guess I'm just adding it out of habit).


Filling the gaps

After some tinkering and more help from Eliah Kagan, I worked out more succinct way to increment the names that has the advantage of filling the gaps. The problem with this way is that only increments a number, does some simple arithmetic on that number, formats it, and puts it in the filename. Bash thinks (so to speak) "ok, here's the next file, I'll give it the next name", without paying any attention to the original filename. This means it creates new names that don't relate to the old names, so you will not be able to logically undo the renaming, and the files will be renamed in the correct order only if their names are already such that they will be processed in the right order. This is the case in your example, which has fixed-width zero-padded numbers, but if you had files named, say, 2, 8, 10, 45 they would be processed in the order 10, 2, 45, 8, which is probably not what you want.

If this approach is suitable for you given all that, you can do it like this:

i=0; for f in ome????.tif; do ((i++)); printf -v new "ome%04dc%d.tif" $(((i+1)/2)) $(((i+1)%2+1)); echo mv -vn "$f" "$new"; done 

or

#!/bin/bash
i=0

for f in ome????.tif; do 
    ((i++))
    printf -v new "ome%04dc%d.tif" $(((i+1)/2)) $(((i+1)%2+1))
    echo mv -vn "$f" "$new"
done 

Notes

  • i=0 initiate a variable
  • ((i++)) increment the variable by one (this counts iterations of the loop)
  • printf -v new put the following statement into the variable new
  • "ome%04dc%d.tif" the new filename with the number formats that will be replaced with the subsequently mentioned numbers
  • $(((i+1)/2)) the number of times the loop has been run plus one, divided by 2

    This works on the basis that Bash only does integer division, so when we divide an odd number by 2, we get the same answer as we got when we divided the preceding even number by 2:

    $ echo $((2/2))
    1
    $ echo $((3/2))
    1
    
  • $(((i+1)%2+1)) The remainder after dividing number of times the loop has been run plus one by two, plus one. This means, if the number of the iteration is odd (eg the first run), the output is 1, and if the number of the iteration is even (eg the second run), the output is 2, giving c1 or c2
  • I used i=0 because then at any point during the run, the value of i will be the number of times the loop has been run, which might be useful for debugging as it will also be the ordinal number of the file being processed (i.e. when i=69, we are processing the 69th file). However, we can simplify the arithmetic by starting with a different i, for example:

    i=2; for f in ome????.tif; do printf -v new "ome%04dc%d.tif" $((i/2)) $((i%2+1)); echo mv -vn "$f" "$new"; ((i++)); done 
    

    There are lots of ways to do this :)

  • echo just for testing - remove if you see the result you want.

Here's an example of what this method does:

$ ls
ome0002.tif  ome0004.tif  ome0007.tif  ome0009.tif  ome0010.tif  ome0012.tif  ome0019.tif  ome0100.tif  ome2996.tif
$ i=0; for f in ome????.tif; do ((i++)); printf -v new "ome%04dc%d.tif" $(((i+1)/2)) $(((i+1)%2+1)); echo mv -vn "$f" "$new"; done 
mv -vn ome0002.tif ome0001c1.tif
mv -vn ome0004.tif ome0001c2.tif
mv -vn ome0007.tif ome0002c1.tif
mv -vn ome0009.tif ome0002c2.tif
mv -vn ome0010.tif ome0003c1.tif
mv -vn ome0012.tif ome0003c2.tif
mv -vn ome0019.tif ome0004c1.tif
mv -vn ome0100.tif ome0004c2.tif
mv -vn ome2996.tif ome0005c1.tif