AppleScript: How to extract section from string (based on defined characters)?

Running the following AppleScript code in AppleScript Editor:

set theText to "I ate an apple at 11:54 pm without the skin."
set theTime to do shell script "awk -F ' at | am | pm ' '{print $2}'<<<" & quoted form of theText
log "The time was: " & theTime

set theText to "I ate two navel oranges at 6:30 am with a glass of water."
set theTime to do shell script "awk -F ' at | am | pm ' '{print $2}'<<<" & quoted form of theText
log "The time was: " & theTime

Produces the following output in the AppleScript Editor's Event Log:

tell current application
    do shell script "awk -F ' at | am | pm ' '{print $2}'<<<'I ate an apple at 11:54 pm without the skin.'"
        --> "11:54"
    (*The time was: 11:54*)
    do shell script "awk -F ' at | am | pm ' '{print $2}'<<<'I ate two navel oranges at 6:30 am with a glass of water.'"
        --> "6:30"
    (*The time was: 6:30*)
end tell

In the above examples I've defined the field separators (delimiters) in awk using the -F option as ' at | am | pm ' which equates to " at ", " am " and " pm " and it prints '{print $2}' what's between the field separators.

Note: The use of the log command is not necessary to the coding for the answer and is being used just to show what the value of theTime contains for the Event Log output aside from what's shown after -->, which is the result as normally shown in the Event Log.


Update: I wrote my original answer based on a literal interpretation in that when said, "So, I would like the opening text item delimiter to be at_ and the closing text item delimiter to be either _pm or _am", what was wanted was to literally use those as the delimiters. However, since a different solution using pure AppleScript code, in a separate answer, was presented, let me present a one-line AppleScript code solution that does the same thing the 8 lines of pure AppleScript code does and by focusing on the colon, but as as part of a RegEx representation of the time in hours and minutes.

set theTime to do shell script "awk 'match($0,/[0-9]{1,2}:[0-5][0-9]/) {print substr($0,RSTART,RLENGTH)}'<<<" & quoted form of theText

Running the following AppleScript code in AppleScript Editor:

set theText to "I ate an apple at 11:54 pm without the skin."
set theTime to do shell script "awk 'match($0,/[0-9]{1,2}:[0-5][0-9]/) {print substr($0,RSTART,RLENGTH)}'<<<" & quoted form of theText

set theText to "I ate two navel oranges at 6:30 am with a glass of water."
set theTime to do shell script "awk 'match($0,/[0-9]{1,2}:[0-5][0-9]/) {print substr($0,RSTART,RLENGTH)}'<<<" & quoted form of theText

Produces the following output in the AppleScript Editor's Event Log:

tell current application
    do shell script "awk 'match($0,/[0-9]{1,2}:[0-5][0-9]/) {print substr($0,RSTART,RLENGTH)}'<<<'I ate an apple at 11:54 pm without the skin.'"
            --> "11:54"
    do shell script "awk 'match($0,/[0-9]{1,2}:[0-5][0-9]/) {print substr($0,RSTART,RLENGTH)}'<<<'I ate two navel oranges at 6:30 am with a glass of water.'"
            --> "6:30"
end tell

As you can see, whether or not there are one to two numbers preceding the colon the RegEx matches and the awk program returns the desired match, of which being the time.

Personally, I'd choose to use this particular method over my original answer, as it's a better method under the circumstances, and or over the pure AppleScript code as I can't justify writing 8 lines of pure AppleScript code when a single line of regular AppleScript code produces the same results as the 8 lines do!


You can do it with pure AppleScript, even though the shell script method presented above is very nice:

set theText to "I ate an apple at 11:54 pm without the skin."
set the_colon_location to offset of ":" in theText
-- now we know where the colon is.
-- The Time is going to be on either side of it.
set the_starting_point to the_colon_location - 2
set the_ending_point to the_colon_location + 5
set the_time_string to characters the_starting_point thru the_ending_point of theText as string
-- in case the hour is not two digits, the first character will be a space
if character 1 of the_time_string is " " then
    set the_time_string to characters 2 thru -1 of the_time_string as string
end if
return the_time_string