Find all files with given extension in subfolders, and add substring corresponding to subfolder
I have numerous txt
files, scattered across different folders.
- case1
|
- 0.25
|
- case1.txt
- 0.35
|
_ case1.txt
- 0.30
|
_ case1.txt
- 0.45
|
_ case1.txt
- case2
|
- 0.25
|
- case2.txt
- 0.35
|
_ case2.txt
- 0.30
|
_ case2.txt
- 0.45
|
_ case2.txt
.
.
.
I would like to copy them all to a folder, but unfortunately, as you can see, some of them have the same name, and thus a naive find
solution ends up overwriting them. I would like to copy all the txt
files to a directory foo
, inserting the name of the name of the subfolder they're in, before the .txt
extension. Also, since that the subfolder has a dot in the name, and I need to copy these files to Windows, I'd also like to change 0.25
to 0_25
. In other words, the file
- case2
|
- 0.25
|
- case2.txt
must be copied to foo
as case2_0_25.txt
. If a bash solution is too complex/unreadable, a Python solution would be fine too, but not a zsh one.
Solution 1:
You divide the copy into two transformations where the first step translates dots and slashes to underscore, and the next rearranges the matched subexpressions into the desired order.
shopt -s globstar; \
tar -cvf - --show-transformed-names --transform='s![.]!_!g' --transform='s!.*/\([^/]\+\)/\([^/]\+\)_txt$!foo/\2_\1.txt!' case*/**/*.txt | tar -xf -
Now, it's crucial --transform
matches, or those files will go through unprocessed.
A Bash solution will work much the same way.
#!/bin/bash
shopt -s nullglob globstar
mkdir -p foo
while read -rd ''; do
[[ ${REPLY//./_} =~ ([^/]+)/([^/]+)_txt$ ]] &&
cp -va "$REPLY" "foo/${BASH_REMATCH[2]}_${BASH_REMATCH[1]}.txt"
done < <(printf %s\\0 case*/**/*.txt)
Solution 2:
You can do this easily enough using the globstar option of bash (from man bash
):
globstar
If set, the pattern
**
used in a pathname expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a/
, only directories and subdirectories match.
Since we can use **
to find the files, we just need to define the new name as including the original directory names and changing .
to _
:
for file in **/*.txt; do
newName=$(sed 's|[/.]|_|g' <<<"$file" | sed 's/_txt$/.txt/')
cp -- "$file" foo/"$newName"
done
Explanation
-
for file in **/*.txt; do
: find all files (and directories, if that's relevant) in the current directory whose name ends in*.txt
. -
newName=$(sed 's|[/.]|_|g' <<<"$file" | sed 's/_txt$/.txt/')
: usesed
to convert all/
and.
to_
in the file name. Note that$file
here also includes the path, so it will be something likecase1/0.25/case1.txt
and that becomescase1_0_25_case1_txt
. We then pass the output of the firstsed
to a second one which converts_txt
(if found at the end of the line) to.txt
, giving us ``case1_0_25_case1.txt. The final output is saved in the variable
$newName`. -
cp -- "$file" foo/"$newName"
: we now copy the file to the directoryfoo/
and with the new name. The--
is not really needed here, but ensures the approach will work with any file name, including those whose name starts with a-
.
I recreated the folder structure you show in your question, ran the command above and got:
$ tree
.
├── case1
│ ├── 0.25
│ │ └── case1.txt
│ ├── 0.30
│ │ └── case1.txt
│ ├── 0.35
│ │ └── case1.txt
│ └── 0.45
│ └── case1.txt
├── case2
│ ├── 0.25
│ │ └── case2.txt
│ ├── 0.30
│ │ └── case2.txt
│ ├── 0.35
│ │ └── case2.txt
│ └── 0.45
│ └── case2.txt
└── foo
├── case1_0_25_case1.txt
├── case1_0_30_case1.txt
├── case1_0_35_case1.txt
├── case1_0_45_case1.txt
├── case2_0_25_case2.txt
├── case2_0_30_case2.txt
├── case2_0_35_case2.txt
└── case2_0_45_case2.txt
11 directories, 16 files