How do I delete everything after second occurrence of quotes using the command line?
Using awk:
awk -v RS='"' -v ORS='"' 'NR==1{print} NR==2{print; printf"\n";exit}' file
This sets the record separator to "
. So, we want to print the first two records and then we are done. In more detail:
-
-v RS='"'
This sets the input record separator to a double quote.
-
-v ORS='"'
This sets the out record separator to a double quote.
-
NR==1{print}
This tells awk to print the first line.
-
NR==2{print; printf"\n";exit}
This tells awk to print the second line, then print a newline character, and then exit.
Using sed
sed -r 'H;1h;$!d;x; s/(([^"]*"){2}).*/\1/' file
This reads the whole file in at once. So, if the file is huge, don't use this approach. It works as follows:
-
H;1h;$!d;x
This is a useful sed idiom: it reads the whole file in at once.
-
s/(([^"]*"){2}).*/\1/
This looks for the second
"
and then deletes all text which follows the second quote.The regex
(([^"]*"){2})
captures all text up to and including the second double quote and saves it in group 1. The regex.*
captures everything that follows to the end of the file. The replacement text is group 1,\1
.
Using Perl:
< infile perl -0777 -pe 's/((.*?"){2}).*/$1/s' > outfile
-
-0777
: slurps the whole file at once instead of one line at the time -
-p
: places awhile (<>) {[...]}
loop around the script and prints the processed file -
-e
: reads the script from the arguments
Perl command breakdown:
-
s
: asserts to perform a substitution -
/
: starts the pattern -
((.*?"){2})
: matches and groups any number of any character zero or more times lazily within the current file (i.e. it matches the least times as possible, stopping when the following pattern starts to match) before a"
character, twice -
.*
: matches any number of any character zero or more times greedily within the current file (i.e. it matches the most times as possible) -
/
: stops the pattern / starts the replacement string -
$1
: replaces with the first captured group -
/
: stops the replacement string / starts the modifiers -
s
: treats the whole file as a single line, allowing.
to match also newlines