What is the sed comand to fix this file so last value on each line is double quoted

I have a file containing two million lines of the form:

"00005cea-668e-4475-9e19-92a25c8b74fb",129.24728",D#

the last value should actually be:

"00005cea-668e-4475-9e19-92a25c8b74fb",129.24728,"D#"

Please , how do I use sed command to fix this file so the spurious " is removed and the last value is double quoted


Solution 1:

You could try something like:

sed -r 's/",([^,]*)$/,"\1"/' input-file

That's a ", followed by anything that's not a comma ([^,]) till the end of the file $. \1 is the part matched by in the parentheses - ([^,]*).

Solution 2:

Not sed, but perl:

perl -F, -ane '($f1)=$F[1]=~/(.*)"/; $F[2]=~s/\n//g; print "$F[0],$f1,\"$F[2]\"\n";'

Explanation:

  • perl -F, -ane reads the input line-wise and splits the line on a ,
  • ($f1)=$F[1]=~/(.*)"/; removes the " in the second column
  • $F[2]=~s/\n//g; removes the newline at the end
  • print "$F[0],$f1,\"$F[2]\"\n"; writes the output and adds the " to the last value

Edit - shortened Version (thanks to @kos):

perl -F, -lane '$F[1]=~s/"$//; print "$F[0],$F[1],\"$F[2]\"";'

Solution 3:

It looks like your fields are defined by commas. If so, you can do this in sed:

sed -i -r 's/",([^,]*)$/,"\1"/' file

Or, in Perl:

perl  -i -lpe 's/",([^,]*)$/,"\1"/' file

In both cases, the regex simply looks for a comma, then 0 or more non-commas until the end of the line. The parentheses capture the last fields which we can then refer to as \1 (or $1 in perl). This is then replaced by itself inside double quotes. The -i is for editing the file in place, changes are made to the original file.

You could also use awk:

awk -F, -vOFS=, '{sub(/"/,"",$(NF-1)); $NF="\""$NF"\""}1;' file

Or, if your version supports it:

awk -iinplace -F, -vOFS=, '{sub(/"/,"",$(NF-1)); $NF="\""$NF"\""}1;' file