How do you append the first pattern of a regular expression to the end of a line using sed?
Solution 1:
Unlike Perl, sed doesn't support the PCRE lookahead syntax (?=_)
but you could fake it as follows:
- match
>
anchored to the start of the line^>
- then match and capture zero or more non-
_
characters([^_]*)
- then match everything else
.*
then replace with
- the entire matched pattern
&
- followed by literal
|
and then the first captured group\1
So
$ sed -E 's/^>([^_]*).*/&|\1/' sample_file.fasta
>uce-8374_Genus_species|uce-8374
ACGTACGTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTACGATCGCGGTATATCGGCGATTCGATCG
>uce-239_Genus_species|uce-239
ATCGTAGCATGCGCTAGCTAGCTAGCTCGCGGTACGCATGTCTGACTGCGTCTGGTCGTACGATTACTACGACTGCG
>uce-83_Genus_species|uce-83
ATCGATCTAGCGTAGCATGCGATCGATATCTGCGATCGACTCGATGCATGCATGCATCGATGCTAGCTAGCTAGCTA
>uce-902_Genus_species|uce-902
AGCTGACTAGCTGGCGATACTGGCGATATCGGATTACGCGGCATATCGAGCGAGTCGATCGATGCATCTGATGCAGC