splitting a string based on tab in the file
I have file that contains values separated by tab ("\t"). I am trying to create a list and store all values of file in the list. But I get some problem. Here is my code.
line = "abc def ghi"
values = line.split("\t")
It works fine as long as there is only one tab between each value. But if there is one than one tab then it copies the tab to values as well. In my case mostly the extra tab will be after the last value in the file.
Solution 1:
You can use regex
here:
>>> import re
>>> strs = "foo\tbar\t\tspam"
>>> re.split(r'\t+', strs)
['foo', 'bar', 'spam']
update:
You can use str.rstrip
to get rid of trailing '\t'
and then apply regex.
>>> yas = "yas\t\tbs\tcda\t\t"
>>> re.split(r'\t+', yas.rstrip('\t'))
['yas', 'bs', 'cda']
Solution 2:
You can use regexp to do this:
import re
patt = re.compile("[^\t]+")
s = "a\t\tbcde\t\tef"
patt.findall(s)
['a', 'bcde', 'ef']
Solution 3:
Split on tab, but then remove all blank matches.
text = "hi\tthere\t\t\tmy main man"
print [splits for splits in text.split("\t") if splits is not ""]
Outputs:
['hi', 'there', 'my main man']
Solution 4:
An other regex
-based solution:
>>> strs = "foo\tbar\t\tspam"
>>> r = re.compile(r'([^\t]*)\t*')
>>> r.findall(strs)[:-1]
['foo', 'bar', 'spam']