Split by comma and strip whitespace in Python
I have some python code that splits on comma, but doesn't strip the whitespace:
>>> string = "blah, lots , of , spaces, here "
>>> mylist = string.split(',')
>>> print mylist
['blah', ' lots ', ' of ', ' spaces', ' here ']
I would rather end up with whitespace removed like this:
['blah', 'lots', 'of', 'spaces', 'here']
I am aware that I could loop through the list and strip() each item but, as this is Python, I'm guessing there's a quicker, easier and more elegant way of doing it.
Solution 1:
Use list comprehension -- simpler, and just as easy to read as a for
loop.
my_string = "blah, lots , of , spaces, here "
result = [x.strip() for x in my_string.split(',')]
# result is ["blah", "lots", "of", "spaces", "here"]
See: Python docs on List Comprehension
A good 2 second explanation of list comprehension.
Solution 2:
I came to add:
map(str.strip, string.split(','))
but saw it had already been mentioned by Jason Orendorff in a comment.
Reading Glenn Maynard's comment on the same answer suggesting list comprehensions over map I started to wonder why. I assumed he meant for performance reasons, but of course he might have meant for stylistic reasons, or something else (Glenn?).
So a quick (possibly flawed?) test on my box (Python 2.6.5 on Ubuntu 10.04) applying the three methods in a loop revealed:
$ time ./list_comprehension.py # [word.strip() for word in string.split(',')]
real 0m22.876s
$ time ./map_with_lambda.py # map(lambda s: s.strip(), string.split(','))
real 0m25.736s
$ time ./map_with_str.strip.py # map(str.strip, string.split(','))
real 0m19.428s
making map(str.strip, string.split(','))
the winner, although it seems they are all in the same ballpark.
Certainly though map (with or without a lambda) should not necessarily be ruled out for performance reasons, and for me it is at least as clear as a list comprehension.
Solution 3:
Split using a regular expression. Note I made the case more general with leading spaces. The list comprehension is to remove the null strings at the front and back.
>>> import re
>>> string = " blah, lots , of , spaces, here "
>>> pattern = re.compile("^\s+|\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['blah', 'lots', 'of', 'spaces', 'here']
This works even if ^\s+
doesn't match:
>>> string = "foo, bar "
>>> print([x for x in pattern.split(string) if x])
['foo', 'bar']
>>>
Here's why you need ^\s+:
>>> pattern = re.compile("\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
[' blah', 'lots', 'of', 'spaces', 'here']
See the leading spaces in blah?
Clarification: above uses the Python 3 interpreter, but results are the same in Python 2.
Solution 4:
Just remove the white space from the string before you split it.
mylist = my_string.replace(' ','').split(',')