Find all files with given extension in subfolders, and add substring corresponding to subfolder

I have numerous txt files, scattered across different folders.

- case1
   |
    - 0.25
       |
        - case1.txt
    - 0.35
       |
        _ case1.txt
    - 0.30
       |
        _ case1.txt
    - 0.45
       |
        _ case1.txt

- case2
   |
    - 0.25
       |
        - case2.txt
    - 0.35
       |
        _ case2.txt
    - 0.30
       |
        _ case2.txt
    - 0.45
       |
        _ case2.txt

.
.
.

I would like to copy them all to a folder, but unfortunately, as you can see, some of them have the same name, and thus a naive find solution ends up overwriting them. I would like to copy all the txt files to a directory foo, inserting the name of the name of the subfolder they're in, before the .txt extension. Also, since that the subfolder has a dot in the name, and I need to copy these files to Windows, I'd also like to change 0.25 to 0_25. In other words, the file

- case2
   |
    - 0.25
       |
        - case2.txt

must be copied to foo as case2_0_25.txt. If a bash solution is too complex/unreadable, a Python solution would be fine too, but not a zsh one.


Solution 1:

You divide the copy into two transformations where the first step translates dots and slashes to underscore, and the next rearranges the matched subexpressions into the desired order.

shopt -s globstar; \
tar -cvf - --show-transformed-names --transform='s![.]!_!g' --transform='s!.*/\([^/]\+\)/\([^/]\+\)_txt$!foo/\2_\1.txt!' case*/**/*.txt | tar -xf -

Now, it's crucial --transform matches, or those files will go through unprocessed.

A Bash solution will work much the same way.

#!/bin/bash

shopt -s nullglob globstar

mkdir -p foo
while read -rd ''; do
    [[ ${REPLY//./_} =~ ([^/]+)/([^/]+)_txt$ ]] && 
        cp -va "$REPLY" "foo/${BASH_REMATCH[2]}_${BASH_REMATCH[1]}.txt"
done < <(printf %s\\0 case*/**/*.txt)

Solution 2:

You can do this easily enough using the globstar option of bash (from man bash):

globstar

If set, the pattern ** used in a pathname expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a /, only directories and subdirectories match.

Since we can use ** to find the files, we just need to define the new name as including the original directory names and changing . to _:

for file in **/*.txt; do 
    newName=$(sed 's|[/.]|_|g' <<<"$file" | sed 's/_txt$/.txt/')
    cp -- "$file" foo/"$newName" 
done

Explanation

  • for file in **/*.txt; do: find all files (and directories, if that's relevant) in the current directory whose name ends in *.txt.
  • newName=$(sed 's|[/.]|_|g' <<<"$file" | sed 's/_txt$/.txt/') : use sed to convert all / and . to _ in the file name. Note that $file here also includes the path, so it will be something like case1/0.25/case1.txt and that becomes case1_0_25_case1_txt. We then pass the output of the first sed to a second one which converts _txt (if found at the end of the line) to .txt, giving us ``case1_0_25_case1.txt. The final output is saved in the variable $newName`.
  • cp -- "$file" foo/"$newName": we now copy the file to the directory foo/ and with the new name. The -- is not really needed here, but ensures the approach will work with any file name, including those whose name starts with a -.

I recreated the folder structure you show in your question, ran the command above and got:


$ tree
.
├── case1
│   ├── 0.25
│   │   └── case1.txt
│   ├── 0.30
│   │   └── case1.txt
│   ├── 0.35
│   │   └── case1.txt
│   └── 0.45
│       └── case1.txt
├── case2
│   ├── 0.25
│   │   └── case2.txt
│   ├── 0.30
│   │   └── case2.txt
│   ├── 0.35
│   │   └── case2.txt
│   └── 0.45
│       └── case2.txt
└── foo
    ├── case1_0_25_case1.txt
    ├── case1_0_30_case1.txt
    ├── case1_0_35_case1.txt
    ├── case1_0_45_case1.txt
    ├── case2_0_25_case2.txt
    ├── case2_0_30_case2.txt
    ├── case2_0_35_case2.txt
    └── case2_0_45_case2.txt

11 directories, 16 files