Apply sed operations only to lines beginning with a particular string
I have the following file format
Received from +11231231234 at 2021-10-10T19:56:50-07:00:
This is a message that contains words like from, at, etc.
Sent to +11231231234 at 2021-10-11T06:50:57+00:00:
This is another message that contains words like to, at, etc.
I want to clean up the "Received" and "Sent" lines, the following sed commands achieves this
cat file | sed 's/from//g' | sed 's/to/ /g' | sed 's/+\w\+//' | \
sed 's/at//g' | sed 's/T/ /g' | sed 's/[[:digit:].]*\:$//' | \
sed 's/[[:digit:].]*\:$//' | sed 's/-$//' | sed 's/-$//' | sed 's/+$//'
and results in the following
Received 2021-10-10 19:56:50
This is a message that contains words like , , etc.
Sent 2021-10-11 06:50:57
This is another message that contains words like , , etc.
As you can see it does clean up the "Received" and "Sent" lines nicely. But it also cleans up the message lines! How can I apply these operations only on lines starting with "Received" and "Sent" ?
Solution 1:
You can use a pattern to pick out which lines to apply a subsequent command to:
sed '/^Sent\|^Received/ s/pattern/replacement/' your_file
Bonus
You can actually do all your of edits in one glorious sed command:
sed '/^Received\|^Sent/ s/\(^[^ ]*\).*at \(.*\)T\(.*\)[-+].*/\1 \2 \3/' your_file
Essentially, the pattern matches every piece of text on the line and we just 'remember' all the bits we want to keep, and then replace the entire line with them.
Output:
Received 2021-10-10 19:56:50
This is a message that contains words like from, at, etc.
Sent 2021-10-11 06:50:57
This is another message that contains words like to, at, etc.
The way it works is as follows:
-
\(
and\)
are 'capture groups' that remember whatever was matched in between them. -
^[^ ]*
matches the beginning of a line followed by any number of consecutive non-whitespace characters (i.e. the first word on the line). -
.*at
matches everything up to and including the word 'at' (and the space following it) - this is not in a capture group and so is not 'remembered'. -
\(.*\)T
remembers (in the second capture group) everything up to, but not including the capital 'T'. -
\(.*\)[-+].*
remembers (in the third capture group) everything up to, but not including, either a '-' or a '+' (and anything that follows the '-/+'). -
/\1 \2 \3/
means replace the match (i.e. the entire line) with the contents of the 1st, 2nd and 3rd capture groups.
This page explains sed very well - also it has a fantastic set of other unix tutorials.