What is the sed comand to fix this file so last value on each line is double quoted
I have a file containing two million lines of the form:
"00005cea-668e-4475-9e19-92a25c8b74fb",129.24728",D#
the last value should actually be:
"00005cea-668e-4475-9e19-92a25c8b74fb",129.24728,"D#"
Please , how do I use sed command to fix this file so the spurious " is removed and the last value is double quoted
Solution 1:
You could try something like:
sed -r 's/",([^,]*)$/,"\1"/' input-file
That's a ",
followed by anything that's not a comma ([^,]
) till the end of the file $
. \1
is the part matched by in the parentheses - ([^,]*)
.
Solution 2:
Not sed
, but perl
:
perl -F, -ane '($f1)=$F[1]=~/(.*)"/; $F[2]=~s/\n//g; print "$F[0],$f1,\"$F[2]\"\n";'
Explanation:
-
perl -F, -ane
reads the input line-wise and splits the line on a,
-
($f1)=$F[1]=~/(.*)"/;
removes the"
in the second column -
$F[2]=~s/\n//g;
removes the newline at the end -
print "$F[0],$f1,\"$F[2]\"\n";
writes the output and adds the"
to the last value
Edit - shortened Version (thanks to @kos):
perl -F, -lane '$F[1]=~s/"$//; print "$F[0],$F[1],\"$F[2]\"";'
Solution 3:
It looks like your fields are defined by commas. If so, you can do this in sed
:
sed -i -r 's/",([^,]*)$/,"\1"/' file
Or, in Perl:
perl -i -lpe 's/",([^,]*)$/,"\1"/' file
In both cases, the regex simply looks for a comma, then 0 or more non-commas until the end of the line. The parentheses capture the last fields which we can then refer to as \1
(or $1
in perl). This is then replaced by itself inside double quotes. The -i
is for editing the file in place, changes are made to the original file.
You could also use awk
:
awk -F, -vOFS=, '{sub(/"/,"",$(NF-1)); $NF="\""$NF"\""}1;' file
Or, if your version supports it:
awk -iinplace -F, -vOFS=, '{sub(/"/,"",$(NF-1)); $NF="\""$NF"\""}1;' file