Strip all non-numeric characters (except for ".") from a string in Python
You can use a regular expression (using the re
module) to accomplish the same thing. The example below matches runs of [^\d.]
(any character that's not a decimal digit or a period) and replaces them with the empty string. Note that if the pattern is compiled with the UNICODE
flag the resulting string could still include non-ASCII numbers. Also, the result after removing "non-numeric" characters is not necessarily a valid number.
>>> import re
>>> non_decimal = re.compile(r'[^\d.]+')
>>> non_decimal.sub('', '12.34fe4e')
'12.344'
Another 'pythonic' approach
filter( lambda x: x in '0123456789.', s )
but regex is faster.
Here's some sample code:
$ cat a.py
a = '27893jkasnf8u2qrtq2ntkjh8934yt8.298222rwagasjkijw'
for i in xrange(1000000):
''.join([c for c in a if c in '1234567890.'])
$ cat b.py
import re
non_decimal = re.compile(r'[^\d.]+')
a = '27893jkasnf8u2qrtq2ntkjh8934yt8.298222rwagasjkijw'
for i in xrange(1000000):
non_decimal.sub('', a)
$ cat c.py
a = '27893jkasnf8u2qrtq2ntkjh8934yt8.298222rwagasjkijw'
for i in xrange(1000000):
''.join([c for c in a if c.isdigit() or c == '.'])
$ cat d.py
a = '27893jkasnf8u2qrtq2ntkjh8934yt8.298222rwagasjkijw'
for i in xrange(1000000):
b = []
for c in a:
if c.isdigit() or c == '.': continue
b.append(c)
''.join(b)
And the timing results:
$ time python a.py
real 0m24.735s
user 0m21.049s
sys 0m0.456s
$ time python b.py
real 0m10.775s
user 0m9.817s
sys 0m0.236s
$ time python c.py
real 0m38.255s
user 0m32.718s
sys 0m0.724s
$ time python d.py
real 0m46.040s
user 0m41.515s
sys 0m0.832s
Looks like the regex is the winner so far.
Personally, I find the regex just as readable as the list comprehension. If you're doing it just a few times then you'll probably take a bigger hit on compiling the regex. Do what jives with your code and coding style.