How to remove particular words from lines of a text file?

my text file looks like this:

Liquid penetration 95% mass (m) = 0.000205348
Liquid penetration 95% mass (m) = 0.000265725
Liquid penetration 95% mass (m) = 0.000322823
Liquid penetration 95% mass (m) = 0.000376445
Liquid penetration 95% mass (m) = 0.000425341

now I want to delete Liquid penetration 95% mass (m) from my lines to obtain the values only. How should I do it?

If there's only one = sign, you could delete everything before and including = like this:

$ sed -r 's/.* = (.*)/\1/' file
0.000205348
0.000265725
0.000322823
0.000376445
0.000425341

If you want to change the original file, use the -i option after testing:

sed -ri 's/.* = (.*)/\1/' file

Notes

-r use ERE so we don't have to escape ( and )
s/old/new replace old with new
.* any number of any characters
(things) save things to backreference later with \1, \2, etc.

This is a job for awk; assuming the values occur in last field only (as per your example):

awk '{print $NF}' file.txt

NF is an awk variable, expands to the number of fields in a record (line), hence $NF (note the $ in front) contains the value of the last field.

Example:

% cat temp.txt 
Liquid penetration 95% mass (m) = 0.000205348
Liquid penetration 95% mass (m) = 0.000265725
Liquid penetration 95% mass (m) = 0.000322823
Liquid penetration 95% mass (m) = 0.000376445
Liquid penetration 95% mass (m) = 0.000425341

% awk '{print $NF}' temp.txt
0.000205348
0.000265725
0.000322823
0.000376445
0.000425341

I decided to compare the different solutions, listed here. For this purpose I've created a large file, based on the content provided by the OP:

I created a simple file, named input.file:

$ cat input.file
Liquid penetration 95% mass (m) = 0.000205348
Liquid penetration 95% mass (m) = 0.000265725
Liquid penetration 95% mass (m) = 0.000322823
Liquid penetration 95% mass (m) = 0.000376445
Liquid penetration 95% mass (m) = 0.000425341

Then I executed this loop:

for i in {1..100}; do cat input.file | tee -a input.file; done

Terminal window was blocked. I executed killall tee from another terminal. Then I examined the content of the file by the commands: less input.file and cat input.file. It looked good, except the last line. So I removed the last line and created a backup copy: cp input.file{,.copy} (because of the commands that use inplace option).
The final count of the lines into the file input.file is 2 192 473. I got that number by the command wc:
```
$ cat input.file | wc -l
2192473
```

Here is the result of the comparison:

grep -o '[^[:space:]]\+$'

$ time grep -o '[^[:space:]]\+$' input.file > output.file

real    0m58.539s
user    0m58.416s
sys     0m0.108s

sed -ri 's/.* = (.*)/\1/'

$ time sed -ri 's/.* = (.*)/\1/' input.file

real    0m26.936s
user    0m22.836s
sys     0m4.092s

Alternatively if we redirect the output to a new file the command is more faster:

$ time sed -r 's/.* = (.*)/\1/' input.file > output.file

real    0m19.734s
user    0m19.672s
sys     0m0.056s

gawk '{gsub(".*= ", "");print}'

$ time gawk '{gsub(".*= ", "");print}' input.file > output.file

real    0m5.644s
user    0m5.568s
sys     0m0.072s

rev | cut -d' ' -f1 | rev

$ time rev input.file | cut -d' ' -f1 | rev  > output.file

real    0m3.703s
user    0m2.108s
sys     0m4.916s

grep -oP '.*= \K.*'

$ time grep -oP '.*= \K.*' input.file > output.file

real    0m3.328s
user    0m3.252s
sys     0m0.072s

sed 's/.*= //' (respectively the -i option makes the command few times slower)

$ time sed 's/.*= //' input.file > output.file

real    0m3.310s
user    0m3.212s
sys     0m0.092s

perl -pe 's/.*= //' (the -i option doesn't produce big difference in the productivity here)

$ time perl -i.bak -pe 's/.*= //' input.file

real    0m3.187s
user    0m3.128s
sys     0m0.056s

$ time perl -pe 's/.*= //' input.file > output.file

real    0m3.138s
user    0m3.036s
sys     0m0.100s

awk '{print $NF}'

$ time awk '{print $NF}' input.file  > output.file

real    0m1.251s
user    0m1.164s
sys     0m0.084s

cut -c 35-

$ time cut -c 35- input.file  > output.file

real    0m0.352s
user    0m0.284s
sys     0m0.064s

cut -d= -f2

$ time cut -d= -f2 input.file  > output.file

real    0m0.328s
user    0m0.260s
sys     0m0.064s

The source of the idea.

With grep and the -P for having PCRE (Interpret the pattern as a Perl-Compatible Regular Expression) and the -o to print matched pattern alone. The \K notify will ignore the matched part come before itself.

$ grep -oP '.*= \K.*' infile
0.000205348
0.000265725
0.000322823
0.000376445
0.000425341

Or you could use cut command instead.

cut -d= -f2 infile

How to remove particular words from lines of a text file?

Notes

Related

Recent Posts