How to remove particular words from lines of a text file?

my text file looks like this:

Liquid penetration 95% mass (m) = 0.000205348
Liquid penetration 95% mass (m) = 0.000265725
Liquid penetration 95% mass (m) = 0.000322823
Liquid penetration 95% mass (m) = 0.000376445
Liquid penetration 95% mass (m) = 0.000425341

now I want to delete Liquid penetration 95% mass (m) from my lines to obtain the values only. How should I do it?


If there's only one = sign, you could delete everything before and including = like this:

$ sed -r 's/.* = (.*)/\1/' file
0.000205348
0.000265725
0.000322823
0.000376445
0.000425341

If you want to change the original file, use the -i option after testing:

sed -ri 's/.* = (.*)/\1/' file

Notes

  • -r use ERE so we don't have to escape ( and )
  • s/old/new replace old with new
  • .* any number of any characters
  • (things) save things to backreference later with \1, \2, etc.

This is a job for awk; assuming the values occur in last field only (as per your example):

awk '{print $NF}' file.txt
  • NF is an awk variable, expands to the number of fields in a record (line), hence $NF (note the $ in front) contains the value of the last field.

Example:

% cat temp.txt 
Liquid penetration 95% mass (m) = 0.000205348
Liquid penetration 95% mass (m) = 0.000265725
Liquid penetration 95% mass (m) = 0.000322823
Liquid penetration 95% mass (m) = 0.000376445
Liquid penetration 95% mass (m) = 0.000425341

% awk '{print $NF}' temp.txt
0.000205348
0.000265725
0.000322823
0.000376445
0.000425341

I decided to compare the different solutions, listed here. For this purpose I've created a large file, based on the content provided by the OP:

  1. I created a simple file, named input.file:

    $ cat input.file
    Liquid penetration 95% mass (m) = 0.000205348
    Liquid penetration 95% mass (m) = 0.000265725
    Liquid penetration 95% mass (m) = 0.000322823
    Liquid penetration 95% mass (m) = 0.000376445
    Liquid penetration 95% mass (m) = 0.000425341
    
  2. Then I executed this loop:

    for i in {1..100}; do cat input.file | tee -a input.file; done
    
  3. Terminal window was blocked. I executed killall tee from another terminal. Then I examined the content of the file by the commands: less input.file and cat input.file. It looked good, except the last line. So I removed the last line and created a backup copy: cp input.file{,.copy} (because of the commands that use inplace option).

  4. The final count of the lines into the file input.file is 2 192 473. I got that number by the command wc:

    $ cat input.file | wc -l
    2192473
    

Here is the result of the comparison:

  • grep -o '[^[:space:]]\+$'

    $ time grep -o '[^[:space:]]\+$' input.file > output.file
    
    real    0m58.539s
    user    0m58.416s
    sys     0m0.108s
    
  • sed -ri 's/.* = (.*)/\1/'

    $ time sed -ri 's/.* = (.*)/\1/' input.file
    
    real    0m26.936s
    user    0m22.836s
    sys     0m4.092s
    

    Alternatively if we redirect the output to a new file the command is more faster:

    $ time sed -r 's/.* = (.*)/\1/' input.file > output.file
    
    real    0m19.734s
    user    0m19.672s
    sys     0m0.056s
    
  • gawk '{gsub(".*= ", "");print}'

    $ time gawk '{gsub(".*= ", "");print}' input.file > output.file
    
    real    0m5.644s
    user    0m5.568s
    sys     0m0.072s
    
  • rev | cut -d' ' -f1 | rev

    $ time rev input.file | cut -d' ' -f1 | rev  > output.file
    
    real    0m3.703s
    user    0m2.108s
    sys     0m4.916s
    
  • grep -oP '.*= \K.*'

    $ time grep -oP '.*= \K.*' input.file > output.file
    
    real    0m3.328s
    user    0m3.252s
    sys     0m0.072s
    
  • sed 's/.*= //' (respectively the -i option makes the command few times slower)

    $ time sed 's/.*= //' input.file > output.file
    
    real    0m3.310s
    user    0m3.212s
    sys     0m0.092s
    
  • perl -pe 's/.*= //' (the -i option doesn't produce big difference in the productivity here)

    $ time perl -i.bak -pe 's/.*= //' input.file
    
    real    0m3.187s
    user    0m3.128s
    sys     0m0.056s
    
    $ time perl -pe 's/.*= //' input.file > output.file
    
    real    0m3.138s
    user    0m3.036s
    sys     0m0.100s
    
  • awk '{print $NF}'

    $ time awk '{print $NF}' input.file  > output.file
    
    real    0m1.251s
    user    0m1.164s
    sys     0m0.084s
    
  • cut -c 35-

    $ time cut -c 35- input.file  > output.file
    
    real    0m0.352s
    user    0m0.284s
    sys     0m0.064s
    
  • cut -d= -f2

    $ time cut -d= -f2 input.file  > output.file
    
    real    0m0.328s
    user    0m0.260s
    sys     0m0.064s
    

The source of the idea.


With grep and the -P for having PCRE (Interpret the pattern as a Perl-Compatible Regular Expression) and the -o to print matched pattern alone. The \K notify will ignore the matched part come before itself.

$ grep -oP '.*= \K.*' infile
0.000205348
0.000265725
0.000322823
0.000376445
0.000425341

Or you could use cut command instead.

cut -d= -f2 infile