Duplicate every two lines a variable number of times

I have multiple .fasta files (that are named barcode*_consensus.fasta) that look like this:

>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT

I would like to duplicate/repeat every two lines n number of times, as specified after 'total supporting reads'. So for example, I would like to duplicate the first two lines 12 times, the second two lines 6 times, etc.

With awk, I did manage to select every line that starts with '>' and the next line:

awk '/>/{nr[NR]; nr[NR+1]} NR in nr' barcode01_consensus.fasta

But I can't find out how to print this n number of times with a variable.

Any help is much appreciated.

Updated: So I would like the final file to look something like:

|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000 TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTT

|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000 TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTT

|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000 TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTT

....x 12 times....


Solution 1:

I'd use space or underscore as the field separator. Then, the count is the 8th field:

awk -F'[ _]' '
    $1 ~ /[>|]+consensus$/ {n = $8; print; next}
    {while (--n >= 0) print}
' file

output

>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT

To print each pair of lines n times requires just minor changes:

awk -F'[ _]' '
    $1 ~ /[>|]+consensus$/ {firstline = $0; n = $8; next}
    {while (--n >= 0) print firstline ORS $0}
' file

output

>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT