How to extract information between two unique words in a large text file
Solution 1:
You can use regular expressions for that.
>>> st = "alpha here is my text bravo"
>>> import re
>>> re.findall(r'alpha(.*?)bravo',st)
[' here is my text ']
My test.txt file
alpha here is my line
yipee
bravo
Now using open to read the file and than applying regular expressions
.
>>> f = open('test.txt','r')
>>> data = f.read()
>>> x = re.findall(r'alpha(.*?)bravo',data,re.DOTALL)
>>> x
[' here is my line\nyipee\n']
>>> "".join(x).replace('\n',' ')
' here is my line yipee '
>>>
Solution 2:
a = 'alpha'
b = 'bravo'
text = 'from alpha all the way to bravo and beyond.'
text.split(a)[-1].split(b)[0]
# ' all the way to '
Solution 3:
str.find
and its sibling rfind
have start
and end
args.
alpha = 'qawsed'
bravo = 'azsxdc'
startpos = text.find(alpha) + len(alpha)
endpos = text.find(bravo, startpos)
do_something_with(text[startpos:endpos]
This is the fastest way if the contained text is short and near the front.
If the contained text is relatively large, use:
startpos = text.find(alpha) + len(alpha)
endpos = text.rfind(bravo)
If the contained text is short and near the end, use:
endpos = text.rfind(bravo)
startpos = text.rfind(alpha, 0, endpos - len(alpha)) + len(alpha)
The first method is in any case better than the naive method of starting the second search from the start of the text; use it if your contained text has no dominant pattern.