Python equivalent to perl -pe?

I need to pick some numbers out of some text files. I can pick out the lines I need with grep, but didn't know how to extract the numbers from the lines. A colleague showed me how to do this from bash with perl:

cat results.txt | perl -pe 's/.+(\d\.\d+)\.\n/\1 /'

However, I usually code in Python, not Perl. So my question is, could I have used Python in the same way? I.e., could I have piped something from bash to Python and then gotten the result straight to stdout? ... if that makes sense. Or is Perl just more convenient in this case?


Yes, you can use Python from the command line. python -c <stuff> will run <stuff> as Python code. Example:

python -c "import sys; print sys.path"

There isn't a direct equivalent to the -p option for Perl (the automatic input/output line-by-line processing), but that's mostly because Python doesn't use the same concept of $_ and whatnot that Perl does - in Python, all input and output is done manually (via raw_input()/input(), and print/print()).


For your particular example:

cat results.txt | python -c "import re, sys; print ''.join(re.sub(r'.+(\d\.\d+)\.\n', r'\1 ', line) for line in sys.stdin)"

(Obviously somewhat more unwieldy. It's probably better to just write the script to do it in actual Python.)


You can use:

$ python -c '<your code here>'

You can in theory, but Python doesn't have anywhere near as much regex magic that Perl does, so the resulting command will be much more unwieldy, especially as you can't use regular expressions without importing re (and you'll probably need sys for sys.stdin too).

The Python equivalent of your colleague's Perl one-liner is approximately:

import sys, re
for line in sys.stdin:
    print re.sub(r'.+(\d\.\d+)\.\n', r'\1 ', line)

You have a problem which can be solved several ways.

I think you should consider using regular expression (what perl is doing in your example) directly from Python. Regular expressions are in the re module. An example would be:

import re
filecontent = open('somefile.txt').read()
print re.findall('.+(\d\.\d+)\.$', filecontent)

(I would prefer using $ instead of '\n' for line endings, because line endings are different between operational systems and file encodings)

If you want to call bash commands from inside Python, you could use:

import os
os.system(mycommand)

Where command is the bash command. I use it all the time, because some operations are better to perform in bash than in Python.

Finally, if you want to extract the numbers with grep, use the -o option, which prints only the matched part.