Implement MATLAB's im2col 'sliding' in Python
Q: How to speed this up?
Below is my implementation of Matlab's im2col 'sliding' with the additional feature of returning every n'th column. The function takes an image (or any 2 dim array) and slides from left to right, top to bottom, picking off every overlapping sub-image of a given size, and returning an array whose columns are the sub-images.
import numpy as np
def im2col_sliding(image, block_size, skip=1):
rows, cols = image.shape
horz_blocks = cols - block_size[1] + 1
vert_blocks = rows - block_size[0] + 1
output_vectors = np.zeros((block_size[0] * block_size[1], horz_blocks * vert_blocks))
itr = 0
for v_b in xrange(vert_blocks):
for h_b in xrange(horz_blocks):
output_vectors[:, itr] = image[v_b: v_b + block_size[0], h_b: h_b + block_size[1]].ravel()
itr += 1
return output_vectors[:, ::skip]
example:
a = np.arange(16).reshape(4, 4)
print a
print im2col_sliding(a, (2, 2)) # return every overlapping 2x2 patch
print im2col_sliding(a, (2, 2), 4) # return every 4th vector
returns:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
[[ 0. 1. 2. 4. 5. 6. 8. 9. 10.]
[ 1. 2. 3. 5. 6. 7. 9. 10. 11.]
[ 4. 5. 6. 8. 9. 10. 12. 13. 14.]
[ 5. 6. 7. 9. 10. 11. 13. 14. 15.]]
[[ 0. 5. 10.]
[ 1. 6. 11.]
[ 4. 9. 14.]
[ 5. 10. 15.]]
The performance isn't great, especially considering whether I call im2col_sliding(big_matrix, (8, 8))
(62001 columns) or im2col_sliding(big_matrix, (8, 8), 10)
(6201 columns; keeping only every 10th vector) it will take the same amount of time [where big_matrix is of size 256 x 256].
I'm looking for any ideas to speed this up.
Approach #1
We could use some broadcasting
here to get all the indices of all those sliding windows in one go and thus with indexing achieve a vectorized solution
. This is inspired by Efficient Implementation of im2col and col2im
.
Here's the implementation -
def im2col_sliding_broadcasting(A, BSZ, stepsize=1):
# Parameters
M,N = A.shape
col_extent = N - BSZ[1] + 1
row_extent = M - BSZ[0] + 1
# Get Starting block indices
start_idx = np.arange(BSZ[0])[:,None]*N + np.arange(BSZ[1])
# Get offsetted indices across the height and width of input array
offset_idx = np.arange(row_extent)[:,None]*N + np.arange(col_extent)
# Get all actual indices & index into input array for final output
return np.take (A,start_idx.ravel()[:,None] + offset_idx.ravel()[::stepsize])
Approach #2
Using newly gained knowledge of NumPy array strides
that lets us create such sliding windows, we would have another efficient solution -
def im2col_sliding_strided(A, BSZ, stepsize=1):
# Parameters
m,n = A.shape
s0, s1 = A.strides
nrows = m-BSZ[0]+1
ncols = n-BSZ[1]+1
shp = BSZ[0],BSZ[1],nrows,ncols
strd = s0,s1,s0,s1
out_view = np.lib.stride_tricks.as_strided(A, shape=shp, strides=strd)
return out_view.reshape(BSZ[0]*BSZ[1],-1)[:,::stepsize]
Approach #3
The strided method listed in the previous approach has been incorporated into scikit-image
module for a less messier, like so -
from skimage.util import view_as_windows as viewW
def im2col_sliding_strided_v2(A, BSZ, stepsize=1):
return viewW(A, (BSZ[0],BSZ[1])).reshape(-1,BSZ[0]*BSZ[1]).T[:,::stepsize]
Sample runs -
In [106]: a # Input array
Out[106]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
In [107]: im2col_sliding_broadcasting(a, (2,3))
Out[107]:
array([[ 0, 1, 2, 5, 6, 7, 10, 11, 12],
[ 1, 2, 3, 6, 7, 8, 11, 12, 13],
[ 2, 3, 4, 7, 8, 9, 12, 13, 14],
[ 5, 6, 7, 10, 11, 12, 15, 16, 17],
[ 6, 7, 8, 11, 12, 13, 16, 17, 18],
[ 7, 8, 9, 12, 13, 14, 17, 18, 19]])
In [108]: im2col_sliding_broadcasting(a, (2,3), stepsize=2)
Out[108]:
array([[ 0, 2, 6, 10, 12],
[ 1, 3, 7, 11, 13],
[ 2, 4, 8, 12, 14],
[ 5, 7, 11, 15, 17],
[ 6, 8, 12, 16, 18],
[ 7, 9, 13, 17, 19]])
Runtime test
In [183]: a = np.random.randint(0,255,(1024,1024))
In [184]: %timeit im2col_sliding(img, (8,8), skip=1)
...: %timeit im2col_sliding_broadcasting(img, (8,8), stepsize=1)
...: %timeit im2col_sliding_strided(img, (8,8), stepsize=1)
...: %timeit im2col_sliding_strided_v2(img, (8,8), stepsize=1)
...:
1 loops, best of 3: 1.29 s per loop
1 loops, best of 3: 226 ms per loop
10 loops, best of 3: 84.5 ms per loop
10 loops, best of 3: 111 ms per loop
In [185]: %timeit im2col_sliding(img, (8,8), skip=4)
...: %timeit im2col_sliding_broadcasting(img, (8,8), stepsize=4)
...: %timeit im2col_sliding_strided(img, (8,8), stepsize=4)
...: %timeit im2col_sliding_strided_v2(img, (8,8), stepsize=4)
...:
1 loops, best of 3: 1.31 s per loop
10 loops, best of 3: 104 ms per loop
10 loops, best of 3: 84.4 ms per loop
10 loops, best of 3: 109 ms per loop
Around 16x
speedup there with the strided method over the original loopy version!
For sliding window over different image channels, we can use an updated version of the code provided by Divakar@Implement MATLAB's im2col 'sliding' in Python, i.e.
import numpy as np
A = np.random.randint(0,9,(2,4,4)) # Sample input array
# Sample blocksize (rows x columns)
B = [2,2]
skip=[2,2]
# Parameters
D,M,N = A.shape
col_extent = N - B[1] + 1
row_extent = M - B[0] + 1
# Get Starting block indices
start_idx = np.arange(B[0])[:,None]*N + np.arange(B[1])
# Generate Depth indeces
didx=M*N*np.arange(D)
start_idx=(didx[:,None]+start_idx.ravel()).reshape((-1,B[0],B[1]))
# Get offsetted indices across the height and width of input array
offset_idx = np.arange(row_extent)[:,None]*N + np.arange(col_extent)
# Get all actual indices & index into input array for final output
out = np.take (A,start_idx.ravel()[:,None] + offset_idx[::skip[0],::skip[1]].ravel())
Testing Sample Run
A=
[[[6 2 8 5]
[6 4 7 6]
[8 6 5 2]
[3 1 3 7]]
[[6 0 4 3]
[7 6 4 6]
[2 6 7 1]
[7 6 7 7]]]
out=
[6 8 8 5]
[2 5 6 2]
[6 7 3 3]
[4 6 1 7]
[6 4 2 7]
[0 3 6 1]
[7 4 7 7]
[6 6 6 7]
For further improving the performance (e.g. on convolution) we can also use batch implementation based on the extended code, provided by M Elyia@Implement Matlab's im2col 'sliding' in python, i.e.
import numpy as np
A = np.arange(3*1*4*4).reshape(3,1,4,4)+1 # 3 Sample input array with 1 channel
B = [2,2] # Sample blocksize (rows x columns)
skip = [2,2]
# Parameters
batch, D,M,N = A.shape
col_extent = N - B[1] + 1
row_extent = M - B[0] + 1
# Get batch block indices
batch_idx = np.arange(batch)[:, None, None] * D * M * N
# Get Starting block indices
start_idx = np.arange(B[0])[None, :,None]*N + np.arange(B[1])
# Generate Depth indeces
didx=M*N*np.arange(D)
start_idx=(didx[None, :, None]+start_idx.ravel()).reshape((-1,B[0],B[1]))
# Get offsetted indices across the height and width of input array
offset_idx = np.arange(row_extent)[None, :, None]*N + np.arange(col_extent)
# Get all actual indices & index into input array for final output
act_idx = (batch_idx +
start_idx.ravel()[None, :, None] +
offset_idx[:,::skip[0],::skip[1]].ravel())
out = np.take (A, act_idx)
Testing sample run:
A =
[[[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[13 14 15 16]]]
[[[17 18 19 20]
[21 22 23 24]
[25 26 27 28]
[29 30 31 32]]]
[[[33 34 35 36]
[37 38 39 40]
[41 42 43 44]
[45 46 47 48]]]]
out =
[[[ 1 2 3 9 10 11]
[ 2 3 4 10 11 12]
[ 5 6 7 13 14 15]
[ 6 7 8 14 15 16]]
[[17 18 19 25 26 27]
[18 19 20 26 27 28]
[21 22 23 29 30 31]
[22 23 24 30 31 32]]
[[33 34 35 41 42 43]
[34 35 36 42 43 44]
[37 38 39 45 46 47]
[38 39 40 46 47 48]]]