Strip all non-numeric characters (except for ".") from a string in Python

You can use a regular expression (using the re module) to accomplish the same thing. The example below matches runs of [^\d.] (any character that's not a decimal digit or a period) and replaces them with the empty string. Note that if the pattern is compiled with the UNICODE flag the resulting string could still include non-ASCII numbers. Also, the result after removing "non-numeric" characters is not necessarily a valid number.

>>> import re
>>> non_decimal = re.compile(r'[^\d.]+')
>>> non_decimal.sub('', '12.34fe4e')
'12.344'

Another 'pythonic' approach

filter( lambda x: x in '0123456789.', s )

but regex is faster.


Here's some sample code:

$ cat a.py
a = '27893jkasnf8u2qrtq2ntkjh8934yt8.298222rwagasjkijw'
for i in xrange(1000000):
    ''.join([c for c in a if c in '1234567890.'])

$ cat b.py
import re

non_decimal = re.compile(r'[^\d.]+')

a = '27893jkasnf8u2qrtq2ntkjh8934yt8.298222rwagasjkijw'
for i in xrange(1000000):
    non_decimal.sub('', a)

$ cat c.py
a = '27893jkasnf8u2qrtq2ntkjh8934yt8.298222rwagasjkijw'
for i in xrange(1000000):
    ''.join([c for c in a if c.isdigit() or c == '.'])

$ cat d.py
a = '27893jkasnf8u2qrtq2ntkjh8934yt8.298222rwagasjkijw'
for i in xrange(1000000):
    b = []
    for c in a:
        if c.isdigit() or c == '.': continue
        b.append(c)

    ''.join(b)

And the timing results:


$ time python a.py
real    0m24.735s
user    0m21.049s
sys     0m0.456s

$ time python b.py
real    0m10.775s
user    0m9.817s
sys     0m0.236s

$ time python c.py
real    0m38.255s
user    0m32.718s
sys     0m0.724s

$ time python d.py
real    0m46.040s
user    0m41.515s
sys     0m0.832s

Looks like the regex is the winner so far.

Personally, I find the regex just as readable as the list comprehension. If you're doing it just a few times then you'll probably take a bigger hit on compiling the regex. Do what jives with your code and coding style.