Grep and Python
I need a way of searching a file using grep via a regular expression from the Unix command line. For example when I type in the command line:
python pythonfile.py 'RE' 'file-to-be-searched'
I need the regular expression 'RE'
to be searched in the file and print out the matching lines.
Here's the code I have:
import re
import sys
search_term = sys.argv[1]
f = sys.argv[2]
for line in open(f, 'r'):
if re.search(search_term, line):
print line,
if line == None:
print 'no matches found'
But when I enter a word which isn't present, no matches found
doesn't print
Solution 1:
The natural question is why not just use grep?! But assuming you can't...
import re
import sys
file = open(sys.argv[2], "r")
for line in file:
if re.search(sys.argv[1], line):
print line,
Things to note:
-
search
instead ofmatch
to find anywhere in string - comma (
,
) afterprint
removes carriage return (line will have one) -
argv
includes python file name, so variables need to start at 1
This doesn't handle multiple arguments (like grep does) or expand wildcards (like the Unix shell would). If you wanted this functionality you could get it using the following:
import re
import sys
import glob
for arg in sys.argv[2:]:
for file in glob.iglob(arg):
for line in open(file, 'r'):
if re.search(sys.argv[1], line):
print line,
Solution 2:
Concise and memory efficient:
#!/usr/bin/env python
# file: grep.py
import re, sys, collections
collections.deque(map(sys.stdout.write,(l for l in sys.stdin if re.search(sys.argv[1],l))),maxlen=0)
It works like egrep (without too much error handling), e.g.:
cat input-file | grep.py "RE"
And here is the one-liner:
cat input-file | python -c "import re,sys,collections;collections.deque(map(sys.stdout.write,(l for l in sys.stdin if re.search(sys.argv[1],l))),maxlen=0)" "RE"
Note that the collections.deque
function is required in Python3 because map has become a lazy function.
Solution 3:
Adapted from a grep in python.
Accepts a list of filenames via [2:]
, does no exception handling:
#!/usr/bin/env python
import re, sys, os
for f in filter(os.path.isfile, sys.argv[2:]):
for line in open(f).readlines():
if re.match(sys.argv[1], line):
print line
sys.argv[1]
resp sys.argv[2:]
works, if you run it as an standalone executable, meaning
chmod +x
first