Is there a numpy/scipy dot product, calculating only the diagonal entries of the result?

Imagine having 2 numpy arrays:

> A, A.shape = (n,p)
> B, B.shape = (p,p)

Typically p is a smaller number (p <= 200), while n can be arbitrarily large.

I am doing the following:

result = np.diag(A.dot(B).dot(A.T))

As you can see, I am keeping only the n diagonal entries, however there is an intermediate (n x n) array calculated from which only the diagonal entries are kept.

I wish for a function like diag_dot(), which only calculates the diagonal entries of the result and does not allocate the complete memory.

A result would be:

> result = diag_dot(A.dot(B), A.T)

Is there a premade functionality like this and can this be done efficiently without the need for allocating the intermediate (n x n) array?


Solution 1:

I think i got it on my own, but nevertheless will share the solution:

since getting only the diagonals of a matrix multiplication

> Z = N.diag(X.dot(Y))

is equivalent to the individual sum of the scalar product of rows of X and columns of Y, the previous statement is equivalent to:

> Z = (X * Y.T).sum(-1)

For the original variables this means:

> result = (A.dot(B) * A).sum(-1)

Please correct me if I am wrong but this should be it ...

Solution 2:

You can get almost anything you ever dreamed of with numpy.einsum. Until you start getting the hang of it, it basically seems like black voodoo...

>>> a = np.arange(15).reshape(5, 3)
>>> b = np.arange(9).reshape(3, 3)

>>> np.diag(np.dot(np.dot(a, b), a.T))
array([  60,  672, 1932, 3840, 6396])
>>> np.einsum('ij,ji->i', np.dot(a, b), a.T)
array([  60,  672, 1932, 3840, 6396])
>>> np.einsum('ij,ij->i', np.dot(a, b), a)
array([  60,  672, 1932, 3840, 6396])

EDIT You can actually get the whole thing in a single shot, it's ridiculous...

>>> np.einsum('ij,jk,ki->i', a, b, a.T)
array([  60,  672, 1932, 3840, 6396])
>>> np.einsum('ij,jk,ik->i', a, b, a)
array([  60,  672, 1932, 3840, 6396])

EDIT You don't want to let it figure too much on its own though... Added the OP's answer to its own question for comparison also.

n, p = 10000, 200
a = np.random.rand(n, p)
b = np.random.rand(p, p)

In [2]: %timeit np.einsum('ij,jk,ki->i', a, b, a.T)
1 loops, best of 3: 1.3 s per loop

In [3]: %timeit np.einsum('ij,ij->i', np.dot(a, b), a)
10 loops, best of 3: 105 ms per loop

In [4]: %timeit np.diag(np.dot(np.dot(a, b), a.T))
1 loops, best of 3: 5.73 s per loop

In [5]: %timeit (a.dot(b) * a).sum(-1)
10 loops, best of 3: 115 ms per loop

Solution 3:

A pedestrian answer, which avoids the construction of large intermediate arrays is:

result=np.empty([n,], dtype=A.dtype )
for i in xrange(n):
    result[i]=A[i,:].dot(B).dot(A[i,:])