Why does numpy have a corresponding function for many ndarray methods?

Solution 1:

As others have noted, the identically-named NumPy functions and array methods are often equivalent (they end up calling the same underlying code). One might be preferred over the other if it makes for easier reading.

However, in some instances the two behave different slightly differently. In particular, using the ndarray method sometimes emphasises the fact that the method is modifying the array in-place.

For example, np.resize returns a new array with the specified shape. On the other hand, ndarray.resize changes the shape of the array in-place. The fill values used in each case are also different.

Similarly, a.sort() sorts the array a in-place, while np.sort(a) returns a sorted copy.

Solution 2:

In most cases the method is the basic compiled version. The function uses that method when available, but also has some sort of backup when the argument(s) is not an array. It helps to look at the code and/or docs of the function or method.

For example if in Ipython I ask to look at the code for the sum method, I see that it is compiled code

In [711]: x.sum??
Type:        builtin_function_or_method
String form: <built-in method sum of numpy.ndarray object at 0xac1bce0>
...
Refer to `numpy.sum` for full documentation.

Do the same on np.sum I get many lines of documentation plus some Python code:

   if isinstance(a, _gentype):
        res = _sum_(a)
        if out is not None:
            out[...] = res
            return out
        return res
    elif type(a) is not mu.ndarray:
        try:
            sum = a.sum
        except AttributeError:
            return _methods._sum(a, axis=axis, dtype=dtype,
                                out=out, keepdims=keepdims)
        # NOTE: Dropping the keepdims parameters here...
        return sum(axis=axis, dtype=dtype, out=out)
    else:
        return _methods._sum(a, axis=axis, dtype=dtype,
                            out=out, keepdims=keepdims)

If I call np.sum(x) where x is an array, it ends up calling x.sum():

    sum = a.sum
    return sum(axis=axis, dtype=dtype, out=out)

np.amax similar (but simpler). Note that the np. form can handle a an object that isn't an array (that doesn't have the method), e.g. a list: np.amax([1,2,3]).

np.dot and x.dot both show as 'built-in' function, so we can't say anything about priority. They probably both end up calling some underlying C function.

np.reshape is another that deligates if possible:

try:
    reshape = a.reshape
except AttributeError:
    return _wrapit(a, 'reshape', newshape, order=order)
return reshape(newshape, order=order)

So np.reshape(x,(2,3)) is identical in functionality to x.reshape((2,3)). But the _wrapit expression enables np.reshape([1,2,3,4],(2,2)).

np.sort returns a copy by doing an inplace sort on a copy:

a = asanyarray(a).copy()
a.sort(axis, kind, order)
return a

x.resize is built-in, while np.resize ends up doing a np.concatenate and reshape.

If your array is a subclass, like matrix or masked, it may have its own variant. The action of a matrix .sum is:

return N.ndarray.sum(self, axis, dtype, out, keepdims=True)._collapse(axis)

Solution 3:

Elaborating on Peter's comment for visibility:

We could make it more consistent by removing methods from ndarray and sticking to just functions. But this is impossible because it would break everyone's existing code that uses methods.

Or, we could move all functions to also be methods. But this is impossible because new users and packages are constantly defining new functions. Plus continuing to multiply these duplicate methods violates "there should be one obvious way to do it".

If we could go back in time then I'd probably argue for not having these methods on ndarray at all, and using functions exclusively. ... So this all argues for using functions exclusively

numpy issue: More consistency with array-methods #7452