How to eliminate the extra minus sign when rounding negative numbers towards zero in numpy?
I have a simple question about the fix
and floor
functions in numpy
.
When rounding negative numbers that are larger than -1 towards zero, numpy
round them off correctly to zero however leaves a negative sign. This negative sign interferes with my costume unique_rows function since it uses the ascontiguousarray
to compare elements of the array and this sign disturbs the uniqueness. Both round and fix behave the same in this regard.
>>> np.fix(-1e-6)
Out[1]: array(-0.0)
>>> np.round(-1e-6)
Out[2]: -0.0
Any insights on how to get rid of the sign? I thought about using the np.sign
function but it comes with extra computational cost.
The issue you're having between -0.
and +0.
is part of the specification of how floats are supposed to behave (IEEE754). In some circumstance, one needs this distinction. See, for example, the docs that are linked to in the docs for around
.
It's also worth noting that the two zeros should compare to equal, so
np.array(-0.)==np.array(+0.)
# True
That is, I think the problem is more likely with your uniqueness comparison. For example:
a = np.array([-1., -0., 0., 1.])
np.unique(a)
# array([-1., -0., 1.])
If you want to keep the numbers as floating point but have all the zeros the same, you could use:
x = np.linspace(-2, 2, 6)
# array([-2. , -1.2, -0.4, 0.4, 1.2, 2. ])
y = x.round()
# array([-2., -1., -0., 0., 1., 2.])
y[y==0.] = 0.
# array([-2., -1., 0., 0., 1., 2.])
# or
y += 0.
# array([-2., -1., 0., 0., 1., 2.])
Note, though, you do have to do this bit of extra work since you are trying to avoid the floating point specification.
Note also that this isn't due to a rounding error. For example,
np.fix(np.array(-.4)).tostring().encode('hex')
# '0000000000000080'
np.fix(np.array(-0.)).tostring().encode('hex')
# '0000000000000080'
That is, the resulting numbers are exactly the same, but
np.fix(np.array(0.)).tostring().encode('hex')
# '0000000000000000'
is different. This is why your method is not working, since it's comparing the binary representation of the numbers, which is different for the two zeros. Therefore, I think the problem is more the method of comparison than the general idea of comparing floating point numbers for uniqueness.
A quick timeit test for the various approaches:
data0 = np.fix(4*np.random.rand(1000000,)-2)
# [ 1. -0. 1. -0. -0. 1. 1. 0. -0. -0. .... ]
N = 100
data = np.array(data0)
print timeit.timeit("data += 0.", setup="from __main__ import np, data", number=N)
# 0.171831846237
data = np.array(data0)
print timeit.timeit("data[data==0.] = 0.", setup="from __main__ import np, data", number=N)
# 0.83500289917
data = np.array(data0)
print timeit.timeit("data.astype(np.int).astype(np.float)", setup="from __main__ import np, data", number=N)
# 0.843791007996
I agree with @senderle's point that if you want simple and exact comparisons and can get by with ints, ints will generally be easier. But if you want unique floats, you should be able to do this too, though you need to do it a bit more carefully. The main issue with floats is that you can have small differences that can be introduced from calculations and don't appear in a normal print
, but this isn't an huge barrier and especially not after a round, fix, rint
for a reasonable range of floats.
I think the fundamental problem is that you're using set-like operations on floating-point numbers -- which is something to avoid as a general rule, unless you have a very good reason and a deep understanding of floating-point numbers.
The obvious reason to follow this rule is that even a very small difference between two floats registers as an absolute difference, so numerical error can cause set-like operations to produce unexpected results. Now, in your use case, it might initially seem that you've avoided that problem by rounding first, thereby limiting the range of possible values. But it turns out that unexpected results are still possible, as this corner case shows. Floating-point numbers are hard to reason about.
I think the correct fix is to round and then to convert to int
using astype
.
>>> a
array([-0.5, 2. , 0.2, -3. , -0.2])
>>> numpy.fix(a)
array([-0., 2., 0., -3., -0.])
>>> numpy.fix(a).astype(int) # could also use 'i8', etc...
array([ 0, 2, 0, -3, 0])
Since you're already rounding, this shouldn't throw away any information, and it will be more stable and predictable for set-like operations later. This is one of those cases where it's best to use the correct abstraction!
If you need floats, you can always convert back. The only problem with this is that it creates another copy; but most of the time that's not really a problem. numpy
is fast enough that the overhead of copying is pretty tiny!
I'll add that if your case really demands the use of floats, then tom10's answer is a good one. But I feel that the number of cases in which both floats and set-like operations are genuinely necessary is very small.