Creating paired array based on file name prefix match in bash
I'm trying to create a for loop / while loop in bash which uses two different files (with same prefix). For example:
/home/samples - Contains files A-anything.fq B-anything.fq etc
/home/annotation - Contains files A-anything2.tab B-anything2.tab etc
I call their names in two separate arrays:
filepathsfq=( /home/samples/*fq )
filenamesfq=( "${filepathsfq[@]##*/}" ) #create an array so no
meta-characters in file name mess with anything
filepathstab=( /home/anottation/*tab )
filenamestab=("${filepathstab[@]##*/}")
I'm trying to create a double column array, such as filenamesfq
and filenamestab
are paired based on MATCH of the first 10 characters of strings (which is enough for full file pairing in my case, because the 10 first characters are file identifiers).
For example:
A12345689-anything.fq A12345689-anything2.tab
B12345689-anything.fq B12345689-anything2.tab
I tried with
declare -a a0=("${filepathsfq[@]##*/}")
declare -a a1=("${filepathstab[@]##*/}")
which does work, but I can't call the array on one for loop as one variable
I want this "paired array" because I'm trying to run a for loop which in need can only accept one variable. So this variable must contain all paired names.
I don't even know how to start pairing the names based on 10 first characters. I've been doing it by exporting the values to a CSV file and then using a formula to match the first 10 characters in excel, which is not great.
I also used:
paste -d, <(printf '%s\n' "${filepathsfq[@]##*/}") <(printf '%s\n' "${filepathstab[@]##*/}") >> samples.csv
To create a CSV file, manually verify if everything is paired correctly and then:
while IFS="," read fq tab
do
echo $fq, $tab
done < samples.csv
The code above works for the intended purpose but need external validation for the name-matching. I can't figure how to match the file names and turn this into an array and use it in a for loop or a while loop
Given the two directories:
/home/samples
|-- A12345689-anything.fq
|-- B12345689-anything.fq
|-- C12345689-anything0.fq
|-- C12345689-anything1.fq
`-- D12345689-anything.fq
/home/annotation
|-- A12345689-anything2.tab
|-- B12345689-anything2.tab
|-- C12345689-anything2.tab
`-- E12345689-anything2.tab
The following bash code:
#!/bin/bash
shopt -s nullglob
fq_dirpath=/home/samples
tab_dirpath=/home/annotation
for fq_filepath in "$fq_dirpath"/*.fq
do
prefix=${fq_filepath##*/}
prefix=${prefix:0:10}
fq_filepaths=( "$fq_dirpath"/"$prefix"*.fq )
tab_filepaths=( "$tab_dirpath"/"$prefix"*.tab )
# sanity checks
[ ${#fq_filepaths[@]} -eq 1 ] || continue
[ ${#tab_filepaths[@]} -eq 1 ] || continue
fq_filename=${fq_filepaths##*/}
tab_filename=${tab_filepaths##*/}
# process the pair
printf '%s %s\n' "$fq_filename" "$tab_filename"
done
shopt -u nullglob
outputs:
A12345689-anything.fq A12345689-anything2.tab
B12345689-anything.fq B12345689-anything2.tab