How do you append the first pattern of a regular expression to the end of a line using sed?

Solution 1:

Unlike Perl, sed doesn't support the PCRE lookahead syntax (?=_) but you could fake it as follows:

  • match > anchored to the start of the line ^>
  • then match and capture zero or more non-_ characters ([^_]*)
  • then match everything else .*

then replace with

  • the entire matched pattern &
  • followed by literal | and then the first captured group \1

So

$ sed -E 's/^>([^_]*).*/&|\1/' sample_file.fasta 
>uce-8374_Genus_species|uce-8374
ACGTACGTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTACGATCGCGGTATATCGGCGATTCGATCG

>uce-239_Genus_species|uce-239
ATCGTAGCATGCGCTAGCTAGCTAGCTCGCGGTACGCATGTCTGACTGCGTCTGGTCGTACGATTACTACGACTGCG

>uce-83_Genus_species|uce-83
ATCGATCTAGCGTAGCATGCGATCGATATCTGCGATCGACTCGATGCATGCATGCATCGATGCTAGCTAGCTAGCTA

>uce-902_Genus_species|uce-902
AGCTGACTAGCTGGCGATACTGGCGATATCGGATTACGCGGCATATCGAGCGAGTCGATCGATGCATCTGATGCAGC