Approximate a convolution as a sum of separable convolutions

I want to compute the discrete convolution of two 3D arrays: $A(i, j, k) \ast B(i, j, k)$

Is there a general way to decompose the array $A$ into a sum of a small number of separable arrays? That is:

$A(i, j, k) = \sum_m X_m(i) \times Y_i(j) \times Z_i(k)$

If not, is there a standard way to approximate $A$ as a sum of a relatively small number of separable arrays? $A$ is typically smooth, with a central maximum that dies off towards the edges.

Motivation: I want to compute many iterations of $A \ast B$, with $A$ constant but $B$ changing. $A$ is quite large (16 x 16 x 16, for example), but $B$ is larger still (64 x 1024 x 1024, for example). Direct convolution is very slow to compute. FFT-based convolution is much faster, but still takes tens of seconds for one FFT of $B$ on my machine, and uses a lot of memory. If $A$ is separable, however, three direct convolutions is much faster and uses less memory:

import time, numpy
from scipy.fftpack import fftn, ifftn
from scipy.ndimage import gaussian_filter

a = numpy.zeros((16, 16, 16), dtype=numpy.float64)
b = numpy.zeros((64, 1024, 1024), dtype=numpy.float64)

fftn_a = fftn(a, shape=b.shape)
start = time.clock()
ifftn(fftn_a * fftn(b)).real
end = time.clock()
print "fft convolution:", end - start

start = time.clock()
gaussian_filter(b, sigma=3)
end = time.clock()
print "separable convolution:", end - start

fft convolution: 49 seconds

separable convolution: 9.2 seconds

Alternatively, if someone's got a better way to compute this type of convolution, I'm all ears.


A separable 2D kernel is a rank one kernel (in the sense of matrix rank). So one natural approach is to approximate your 2D kernel with a rank one approximation.

suppose F is your 2D kernel and F* is an approximation, min ||F - F*|| s.t. rank(F*)=1 has an analytical solution (the norm we are minimizing here is the frobenius norm). That solution is the svd approximation of F, truncated to 1 singular value. You can then easily extract your 1D kernel from that matrix.

However, the definition of rank (generalization of) is not quite clear in higher dimension. You might wanna check high order svd algorithm to see if you can adapt it to something usable for your problem.

However, if your 3D kernel is symmetric along the 3 Cartesian axis you could try the following heuristic: taking the middle 2D slice, finding the best rank 1 approximation with the svd method, extracting the 1D kernel from it and building a 3D kernel with it, it might give you something acceptable.

Michael


Two ways to to find a separable approximation $x \otimes y \otimes z$ to a 3d filter A,
i.e. minimize $|A - x \otimes y \otimes z|$:

1) scipy.optimize.leastsq can easily handle a sum of 48 squares,

A.ravel() - np.outer( np.outer( x, y ), z ).ravel()

2) for fixed y and z, the optimal x is just a projection; so optimize x y z x y z ... in turn. I'd expect this to converge pretty fast, but haven't tried it.

With either method, you can of course add more terms, minimize $|A - x \otimes y \otimes z - x2 \otimes y2 \otimes z2 \dots|$ .

Have you timed scipy.ndimage.filters.convolve vs. 3 convolve1d s ?

See also 2d SVD and google "separable filter design algorithm". Also, try asking on dsp.stackexchange.com.