How do I perform a loop on files that have the same string before the first underscore?
I am trying to perform a loop on Ubuntu in which I concatenate two files into one single file. The directory has thousands of files, which all come in pairs that have the same string of characters before the first underscore. As an example, the directory contains the following files:
uce-1348_.nexus.phy.fasta
uce-1348_Sample1.fasta
uce-1611_.nexus.phy.fasta
uce-1611_Sample1.fasta
I have tried performing something along the lines of
for i in *_*.fasta \
do
cat $i > $i.combined.fasta
done
but this of course does not work, as it does not make the combined files specific to the string before the first underscore. I need to have a concatenated file of uce-1348 and another of uce-1611 (there are thousands of more files, but this is a small example of what I am looking for.
Any help would be appreciated. I have heard you can set strings and patterns, but I still don't know how to do this. Thank you!
Solution 1:
You can loop over the files as you did, but then you need to extract the start of your file name to define your output file.
Then you have to use >>
to add to the output file. If you use a single >
it will overwrite the content every time.
So, using your file names:
for file in *_*.fasta; do
output="${file/_*}.combined.fasta"
cat "$file" >> "$output"
done
The expression ${file/_*}
uses Shell Parameter Expansion to delete everything after the first "_" in the file name, to produce your output file name.