How to remove particular words from lines of a text file?
my text file looks like this:
Liquid penetration 95% mass (m) = 0.000205348
Liquid penetration 95% mass (m) = 0.000265725
Liquid penetration 95% mass (m) = 0.000322823
Liquid penetration 95% mass (m) = 0.000376445
Liquid penetration 95% mass (m) = 0.000425341
now I want to delete Liquid penetration 95% mass (m)
from my lines to obtain the values only. How should I do it?
If there's only one =
sign, you could delete everything before and including =
like this:
$ sed -r 's/.* = (.*)/\1/' file
0.000205348
0.000265725
0.000322823
0.000376445
0.000425341
If you want to change the original file, use the -i
option after testing:
sed -ri 's/.* = (.*)/\1/' file
Notes
-
-r
use ERE so we don't have to escape(
and)
-
s/old/new
replaceold
withnew
-
.*
any number of any characters -
(things)
savethings
to backreference later with\1
,\2
, etc.
This is a job for awk
; assuming the values occur in last field only (as per your example):
awk '{print $NF}' file.txt
-
NF
is anawk
variable, expands to the number of fields in a record (line), hence$NF
(note the$
in front) contains the value of the last field.
Example:
% cat temp.txt
Liquid penetration 95% mass (m) = 0.000205348
Liquid penetration 95% mass (m) = 0.000265725
Liquid penetration 95% mass (m) = 0.000322823
Liquid penetration 95% mass (m) = 0.000376445
Liquid penetration 95% mass (m) = 0.000425341
% awk '{print $NF}' temp.txt
0.000205348
0.000265725
0.000322823
0.000376445
0.000425341
I decided to compare the different solutions, listed here. For this purpose I've created a large file, based on the content provided by the OP:
-
I created a simple file, named
input.file
:$ cat input.file Liquid penetration 95% mass (m) = 0.000205348 Liquid penetration 95% mass (m) = 0.000265725 Liquid penetration 95% mass (m) = 0.000322823 Liquid penetration 95% mass (m) = 0.000376445 Liquid penetration 95% mass (m) = 0.000425341
-
Then I executed this loop:
for i in {1..100}; do cat input.file | tee -a input.file; done
Terminal window was blocked. I executed
killall tee
from another terminal. Then I examined the content of the file by the commands:less input.file
andcat input.file
. It looked good, except the last line. So I removed the last line and created a backup copy:cp input.file{,.copy}
(because of the commands that use inplace option).-
The final count of the lines into the file
input.file
is 2 192 473. I got that number by the commandwc
:$ cat input.file | wc -l 2192473
Here is the result of the comparison:
-
grep -o '[^[:space:]]\+$'
$ time grep -o '[^[:space:]]\+$' input.file > output.file real 0m58.539s user 0m58.416s sys 0m0.108s
-
sed -ri 's/.* = (.*)/\1/'
$ time sed -ri 's/.* = (.*)/\1/' input.file real 0m26.936s user 0m22.836s sys 0m4.092s
Alternatively if we redirect the output to a new file the command is more faster:
$ time sed -r 's/.* = (.*)/\1/' input.file > output.file real 0m19.734s user 0m19.672s sys 0m0.056s
-
gawk '{gsub(".*= ", "");print}'
$ time gawk '{gsub(".*= ", "");print}' input.file > output.file real 0m5.644s user 0m5.568s sys 0m0.072s
-
rev | cut -d' ' -f1 | rev
$ time rev input.file | cut -d' ' -f1 | rev > output.file real 0m3.703s user 0m2.108s sys 0m4.916s
-
grep -oP '.*= \K.*'
$ time grep -oP '.*= \K.*' input.file > output.file real 0m3.328s user 0m3.252s sys 0m0.072s
-
sed 's/.*= //'
(respectively the-i
option makes the command few times slower)$ time sed 's/.*= //' input.file > output.file real 0m3.310s user 0m3.212s sys 0m0.092s
-
perl -pe 's/.*= //'
(the-i
option doesn't produce big difference in the productivity here)$ time perl -i.bak -pe 's/.*= //' input.file real 0m3.187s user 0m3.128s sys 0m0.056s
$ time perl -pe 's/.*= //' input.file > output.file real 0m3.138s user 0m3.036s sys 0m0.100s
-
awk '{print $NF}'
$ time awk '{print $NF}' input.file > output.file real 0m1.251s user 0m1.164s sys 0m0.084s
-
cut -c 35-
$ time cut -c 35- input.file > output.file real 0m0.352s user 0m0.284s sys 0m0.064s
-
cut -d= -f2
$ time cut -d= -f2 input.file > output.file real 0m0.328s user 0m0.260s sys 0m0.064s
The source of the idea.
With grep
and the -P
for having PCRE
(Interpret the pattern as a Perl-Compatible Regular Expression) and the -o
to print matched pattern alone. The \K
notify will ignore the matched part come before itself.
$ grep -oP '.*= \K.*' infile
0.000205348
0.000265725
0.000322823
0.000376445
0.000425341
Or you could use cut
command instead.
cut -d= -f2 infile