How to insert a different header for every line with sed?

Since you specifically asked for a sed solution (I wouldn't suggest actually doing it this way - but you could):

$ sed = file | sed '1~2 s/^/>seq/'
>seq1
CWGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAG
>seq2
ACACTCTTTCCCTACACGACGCTCTTCCGATCTAGGC
>seq3
ACACTCTTTCCCTACACGACGCTCTTCCGATCTGATT
>seq4
ACACTCTTTCCCTACACGACGCTCTTCCGATCTACCGT

The first invocation sed = file inserts bare line numbers, then the second decorates themby prepending the >seq string.


OTOH if you know ahead of time that there are 770 lines, then you could do

printf ">seq%d\n" {1..770} | sed 'R file'

although this relies on the GNU sed R extension:

R filename
Queue a line of filename to be read and inserted into the output stream at the end of the current cycle, or when the next input line is read. Note that if filename cannot be read, or if its end is reached, no line is appended, without any error indication.

Of course if you don't know the number of lines ahead of time, you could do

printf ">seq%d\n" $(seq 1 "$(wc -l < file)") | sed 'R file'

but that would lose the advantage of only needing to read the file once.


In practice I'd probably use @John1024's awk solution or its perl equivalent

perl -lpe 'print ">seq" . $.' file

Your task can be done with sed but sed lacks any native understanding of arithmetic which makes it the wrong tool. Awk works well:

$ awk '{print ">seq" NR} 1' file
>seq1
CWGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAG
>seq2
ACACTCTTTCCCTACACGACGCTCTTCCGATCTAGGC
>seq3
ACACTCTTTCCCTACACGACGCTCTTCCGATCTGATT
>seq4
ACACTCTTTCCCTACACGACGCTCTTCCGATCTACCGT

How it works:

  • print ">seq" NR

    For each new line read, we first print the header that you want.

    NR is awk's line counter.

  • 1

    This is awk's cryptic shorthand for print-the-line.


Using a simple loop:

count=1; while read -r line ; do printf '>seq%d\n%s\n' $((count++)) "$line"; done < file

The output:

>seq1
CWGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAG
>seq2
ACACTCTTTCCCTACACGACGCTCTTCCGATCTAGGC
>seq3
ACACTCTTTCCCTACACGACGCTCTTCCGATCTGATT
>seq4
ACACTCTTTCCCTACACGACGCTCTTCCGATCTACCGT