python split a string with at least 2 whitespaces
I would like to split a string only where there are at least two or more whitespaces.
For example
str = '10DEUTSCH GGS Neue Heide 25-27 Wahn-Heide -1 -1'
print str.split()
Results:
['10DEUTSCH', 'GGS', 'Neue', 'Heide', '25-27', 'Wahn-Heide', '-1', '-1']
I would like it to look like this:
['10DEUTSCH', 'GGS Neue Heide 25-27', 'Wahn-Heide', '-1', '-1']
In [4]: import re
In [5]: text = '10DEUTSCH GGS Neue Heide 25-27 Wahn-Heide -1 -1'
In [7]: re.split(r'\s{2,}', text)
Out[7]: ['10DEUTSCH', 'GGS Neue Heide 25-27', 'Wahn-Heide', '-1', '-1']
Update 2021+ answer.
str.split
now accepts regular expressions to split on.
read more here
row = '10DEUTSCH GGS Neue Heide 25-27 Wahn-Heide -1 -1'
df = pd.DataFrame({'string' : row},index=[0])
print(df)
string
0 10DEUTSCH GGS Neue Heide 25-27 Wahn...
df1 = df['string'].str.split('\s{2,}',expand=True)
print(df1)
0 1 2 3 4
0 10DEUTSCH GGS Neue Heide 25-27 Wahn-Heide -1 -1
As has been pointed out, str
is not a good name for your string, so using words
instead:
output = [s.strip() for s in words.split(' ') if s]
The .split(' ') -- with two spaces -- will give you a list that includes empty strings, and items with trailing/leading whitespace. The list comprehension iterates through that list, keeps any non-blank items (if s
), and .strip() takes care of any leading/trailing whitespace.
In [30]: strs='10DEUTSCH GGS Neue Heide 25-27 Wahn-Heide -1 -1'
In [38]: filter(None, strs.split(" "))
Out[38]: ['10DEUTSCH', 'GGS Neue Heide 25-27', ' Wahn-Heide', ' -1', '-1']
In [32]: map(str.strip, filter(None, strs.split(" ")))
Out[32]: ['10DEUTSCH', 'GGS Neue Heide 25-27', 'Wahn-Heide', '-1', '-1']
For python 3, wrap the result of filter
and map
with list
to force iteration.