awk, sed, or other text processing suggestions, please

Solution 1:

This bash script

#!/bin/bash

PART1=$(echo "$1" | sed 's/\(.*\)\s(.*/\1/')
PART3=$(echo "$1" | sed 's/.*)\(.*\)/\1/')
PART2=$(echo "$1" | sed 's/.*(\s*\(.*\)).*/\1/')

START=$(echo "$PART2" | sed 's/\s*-.*//')
END=$(echo "$PART2" | sed 's/.*-\s*//')

STARTNUM=$(echo "$START" | sed 's/^\(.\).*/\1/')
ENDNUM=$(echo "$END" | sed 's/^\(.\).*/\1/')
if test "$STARTNUM" '!=' "$ENDNUM"; then
    echo "Error: Numeral is different"
    exit 1
fi

STARTLETTER=$(echo "$START" | sed 's/^.\(.\).*/\1/')
ENDLETTER=$(echo "$END" | sed 's/^.\(.\).*/\1/')

OUTPUT=''
for LETTER in A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ; do
    test "$LETTER" '==' "$STARTLETTER" && OUTPUT='yes'
    test -n "$OUTPUT" && echo "$PART1, $STARTNUM$LETTER,$PART3"
    test "$LETTER" '==' "$ENDLETTER" && OUTPUT=''
done

Will do what you need, albeit not in a very performant way when called with the original text as $1

EDIT

As requested a few words about the sed expressions:

I isolate PART1 by taking everything before whitespace and an opening (
I isolate PART3 by taking everything from the closing ) onwards
I isolate PART2 by taking what is between ( and ), ignoring whitespace
START and END are isolated by the dash, again ignoring whitespace
Number and Letter are isolated by being first and second character

Solution 2:

If GNU sed is available

sed -r 's/([^(]+) \((.)(.) - .(.)\)(.*)/printf \x27\1, \2%s,\5\\n\x27 {\3..\4}/e' <<<'Gene Code (1A - 1F) D2 fragment, D74F'
Gene Code, 1A, D2 fragment, D74F
Gene Code, 1B, D2 fragment, D74F
Gene Code, 1C, D2 fragment, D74F
Gene Code, 1D, D2 fragment, D74F
Gene Code, 1E, D2 fragment, D74F
Gene Code, 1F, D2 fragment, D74F

If not, run it sending as pipe to the shell

sed -r 's/([^(]+) \((.)(.) - .(.)\)(.*)/printf \x27\1, \2%s,\5\\n\x27 {\3..\4}/' <<<'Gene Code (1A - 1F) D2 fragment, D74F'|bash
Gene Code, 1A, D2 fragment, D74F
Gene Code, 1B, D2 fragment, D74F
Gene Code, 1C, D2 fragment, D74F
Gene Code, 1D, D2 fragment, D74F
Gene Code, 1E, D2 fragment, D74F
Gene Code, 1F, D2 fragment, D74F

(with sh and ksh the output is the same)

Solution 3:

A perl way:

#!/usr/bin/perl
use feature 'say';

my $str = '"Gene Code (3D - 3H) D2 fragment, D74F"';
# get begin number, begin letter, end number, end letter
my ($bn,$bl,$en,$el) = $str =~ /\((.)(.) - (.)(.)\)/;
# loop from begin letter to end letter
for my $i ($bl .. $el) {
    # do the substitution and print
    ($_ = $str) =~ s/ \(.. - ..\)/, $bn$i,/ && say;
}

Output:

"Gene Code, 3D, D2 fragment, D74F"
"Gene Code, 3E, D2 fragment, D74F"
"Gene Code, 3F, D2 fragment, D74F"
"Gene Code, 3G, D2 fragment, D74F"
"Gene Code, 3H, D2 fragment, D74F"

BIOS administrator password reset - Dell laptop [duplicate]

Reallocated Sector Count Field

For which codecs/containers does Windows Media Player now have inbuilt support with Windows 8?

Running dangerous linux commands in bash enviroment on Windows

How to remove extra start item from boot menu?

Windows 7 Internet Sharing - How to have simultaneous Internet Access to my client

Regex: Remove every two or more spaces between specific tags and leave just a space instead

How can I check through command-line the computer form(Desktop, notebook or All-in-one)?

Claim space for a partition from another partition nearby?

Microsoft Excel Macro for seperate rows to four different columns [closed]

Remove username suggest function in Skype 4.2, Windows Vista?

Copy audio stream, while also re-encoding with filters?