Basic Python text extraction scenario
I am currently working with a text file that looks like this.
NUMBER = 6367283940 | FOOD = PASTA | NAME = JOHN WALKER
NUMBER = 6367283940 | FOOD = PASTA | NAME = JOHN WALKER
NUMBER = 6367283940 | FOOD = PASTA | NAME = JOHN WALKER
I would like to extract the number (just the integers) and save them all to a text file that would read:
6367283940
6367283940
6367283940
How would I go about doing this?
I am brand new.
There's perhaps a few ways you might approach this.
Regex
A simple regex pattern should work.
import re
text = """\
NUMBER = 6367283940 | FOOD = PASTA | NAME = JOHN WALKER
NUMBER = 6367283940 | FOOD = PASTA | NAME = JOHN WALKER
NUMBER = 6367283940 | FOOD = PASTA | NAME = JOHN WALKER
"""
pattern = '^NUMBER = (\d+)'
for number in re.findall(pattern, text):
print(number)
6367283940
6367283940
6367283940
For an explanation of the regex, see this regex101 link.
String splitting
A more rudimentary way may be to use regular string operations, like .split
with open('mytext.txt') as f:
for line in f:
fields = line.split('|')
number_field = fields[0]
_, number = number_field.split(' = ')
print(number)
Csv/pandas
Because your file is pipe-delimited, you could also use the csv
module or pandas
as Nuno Carvalho answered.