NumPy: function for simultaneous max() and min()
Solution 1:
Is there a function in the numpy API that finds both max and min with only a single pass through the data?
No. At the time of this writing, there is no such function. (And yes, if there were such a function, its performance would be significantly better than calling numpy.amin()
and numpy.amax()
successively on a large array.)
Solution 2:
I don't think that passing over the array twice is a problem. Consider the following pseudo-code:
minval = array[0]
maxval = array[0]
for i in array:
if i < minval:
minval = i
if i > maxval:
maxval = i
While there is only 1 loop here, there are still 2 checks. (Instead of having 2 loops with 1 check each). Really the only thing you save is the overhead of 1 loop. If the arrays really are big as you say, that overhead is small compared to the actual loop's work load. (Note that this is all implemented in C, so the loops are more or less free anyway).
EDIT Sorry to the 4 of you who upvoted and had faith in me. You definitely can optimize this.
Here's some fortran code which can be compiled into a python module via f2py
(maybe a Cython
guru can come along and compare this with an optimized C version ...):
subroutine minmax1(a,n,amin,amax)
implicit none
!f2py intent(hidden) :: n
!f2py intent(out) :: amin,amax
!f2py intent(in) :: a
integer n
real a(n),amin,amax
integer i
amin = a(1)
amax = a(1)
do i=2, n
if(a(i) > amax)then
amax = a(i)
elseif(a(i) < amin) then
amin = a(i)
endif
enddo
end subroutine minmax1
subroutine minmax2(a,n,amin,amax)
implicit none
!f2py intent(hidden) :: n
!f2py intent(out) :: amin,amax
!f2py intent(in) :: a
integer n
real a(n),amin,amax
amin = minval(a)
amax = maxval(a)
end subroutine minmax2
Compile it via:
f2py -m untitled -c fortran_code.f90
And now we're in a place where we can test it:
import timeit
size = 100000
repeat = 10000
print timeit.timeit(
'np.min(a); np.max(a)',
setup='import numpy as np; a = np.arange(%d, dtype=np.float32)' % size,
number=repeat), " # numpy min/max"
print timeit.timeit(
'untitled.minmax1(a)',
setup='import numpy as np; import untitled; a = np.arange(%d, dtype=np.float32)' % size,
number=repeat), '# minmax1'
print timeit.timeit(
'untitled.minmax2(a)',
setup='import numpy as np; import untitled; a = np.arange(%d, dtype=np.float32)' % size,
number=repeat), '# minmax2'
The results are a bit staggering for me:
8.61869883537 # numpy min/max
1.60417699814 # minmax1
2.30169081688 # minmax2
I have to say, I don't completely understand it. Comparing just np.min
versus minmax1
and minmax2
is still a losing battle, so it's not just a memory issue ...
notes -- Increasing size by a factor of 10**a
and decreasing repeat by a factor of 10**a
(keeping the problem size constant) does change the performance, but not in a seemingly consistent way which shows that there is some interplay between memory performance and function call overhead in python. Even comparing a simple min
implementation in fortran beats numpy's by a factor of approximately 2 ...
Solution 3:
You could use Numba, which is a NumPy-aware dynamic Python compiler using LLVM. The resulting implementation is pretty simple and clear:
import numpy
import numba
@numba.jit
def minmax(x):
maximum = x[0]
minimum = x[0]
for i in x[1:]:
if i > maximum:
maximum = i
elif i < minimum:
minimum = i
return (minimum, maximum)
numpy.random.seed(1)
x = numpy.random.rand(1000000)
print(minmax(x) == (x.min(), x.max()))
It should also be faster than a Numpy's min() & max()
implementation. And all without having to write a single C/Fortran line of code.
Do your own performance tests, as it is always dependent on your architecture, your data, your package versions...
Solution 4:
There is a function for finding (max-min) called numpy.ptp if that's useful for you:
>>> import numpy
>>> x = numpy.array([1,2,3,4,5,6])
>>> x.ptp()
5
but I don't think there's a way to find both min and max with one traversal.
EDIT: ptp just calls min and max under the hood