Duplicate every two lines a variable number of times
I have multiple .fasta files (that are named barcode*_consensus.fasta) that look like this:
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
I would like to duplicate/repeat every two lines n number of times, as specified after 'total supporting reads'. So for example, I would like to duplicate the first two lines 12 times, the second two lines 6 times, etc.
With awk, I did manage to select every line that starts with '>' and the next line:
awk '/>/{nr[NR]; nr[NR+1]} NR in nr' barcode01_consensus.fasta
But I can't find out how to print this n number of times with a variable.
Any help is much appreciated.
Updated: So I would like the final file to look something like:
|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000 TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTT
|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000 TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTT
|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000 TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTT
....x 12 times....
Solution 1:
I'd use space or underscore as the field separator. Then, the count is the 8th field:
awk -F'[ _]' '
$1 ~ /[>|]+consensus$/ {n = $8; print; next}
{while (--n >= 0) print}
' file
output
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
To print each pair of lines n times requires just minor changes:
awk -F'[ _]' '
$1 ~ /[>|]+consensus$/ {firstline = $0; n = $8; next}
{while (--n >= 0) print firstline ORS $0}
' file
output
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT