How to square or raise to a power (elementwise) a 2D numpy array?

The fastest way is to do a*a or a**2 or np.square(a) whereas np.power(a, 2) showed to be considerably slower.

np.power() allows you to use different exponents for each element if instead of 2 you pass another array of exponents. From the comments of @GarethRees I just learned that this function will give you different results than a**2 or a*a, which become important in cases where you have small tolerances.

I've timed some examples using NumPy 1.9.0 MKL 64 bit, and the results are shown below:

In [29]: a = np.random.random((1000, 1000))

In [30]: timeit a*a
100 loops, best of 3: 2.78 ms per loop

In [31]: timeit a**2
100 loops, best of 3: 2.77 ms per loop

In [32]: timeit np.power(a, 2)
10 loops, best of 3: 71.3 ms per loop

>>> import numpy
>>> print numpy.power.__doc__

power(x1, x2[, out])

First array elements raised to powers from second array, element-wise.

Raise each base in `x1` to the positionally-corresponding power in
`x2`.  `x1` and `x2` must be broadcastable to the same shape.

Parameters
----------
x1 : array_like
    The bases.
x2 : array_like
    The exponents.

Returns
-------
y : ndarray
    The bases in `x1` raised to the exponents in `x2`.

Examples
--------
Cube each element in a list.

>>> x1 = range(6)
>>> x1
[0, 1, 2, 3, 4, 5]
>>> np.power(x1, 3)
array([  0,   1,   8,  27,  64, 125])

Raise the bases to different exponents.

>>> x2 = [1.0, 2.0, 3.0, 3.0, 2.0, 1.0]
>>> np.power(x1, x2)
array([  0.,   1.,   8.,  27.,  16.,   5.])

The effect of broadcasting.

>>> x2 = np.array([[1, 2, 3, 3, 2, 1], [1, 2, 3, 3, 2, 1]])
>>> x2
array([[1, 2, 3, 3, 2, 1],
       [1, 2, 3, 3, 2, 1]])
>>> np.power(x1, x2)
array([[ 0,  1,  8, 27, 16,  5],
       [ 0,  1,  8, 27, 16,  5]])
>>>

Precision

As per the discussed observation on numerical precision as per @GarethRees objection in comments:

>>> a = numpy.ones( (3,3), dtype = numpy.float96 ) # yields exact output
>>> a[0,0] = 0.46002700024131926
>>> a
array([[ 0.460027,  1.0,  1.0],
       [ 1.0,  1.0,  1.0],
       [ 1.0,  1.0,  1.0]], dtype=float96)
>>> b = numpy.power( a, 2 )
>>> b
array([[ 0.21162484,  1.0,  1.0],
       [ 1.0,  1.0,  1.0],
       [ 1.0,  1.0,  1.0]], dtype=float96)

>>> a.dtype
dtype('float96')
>>> a[0,0]
0.46002700024131926
>>> b[0,0]
0.21162484095102677

>>> print b[0,0]
0.211624840951
>>> print a[0,0]
0.460027000241

Performance

>>> c    = numpy.random.random( ( 1000, 1000 ) ).astype( numpy.float96 )

>>> import zmq
>>> aClk = zmq.Stopwatch()

>>> aClk.start(), c**2, aClk.stop()
(None, array([[ ...]], dtype=float96), 5663L)                #   5 663 [usec]

>>> aClk.start(), c*c, aClk.stop()
(None, array([[ ...]], dtype=float96), 6395L)                #   6 395 [usec]

>>> aClk.start(), c[:,:]*c[:,:], aClk.stop()
(None, array([[ ...]], dtype=float96), 6930L)                #   6 930 [usec]

>>> aClk.start(), c[:,:]**2, aClk.stop()
(None, array([[ ...]], dtype=float96), 6285L)                #   6 285 [usec]

>>> aClk.start(), numpy.power( c, 2 ), aClk.stop()
(None, array([[ ... ]], dtype=float96), 384515L)             # 384 515 [usec]