numpy vstack vs. column_stack
What exactly is the difference between numpy vstack
and column_stack
. Reading through the documentation, it looks as if column_stack
is an implementation of vstack
for 1D arrays. Is it a more efficient implementation? Otherwise, I cannot find a reason for just having vstack
.
I think the following code illustrates the difference nicely:
>>> np.vstack(([1,2,3],[4,5,6]))
array([[1, 2, 3],
[4, 5, 6]])
>>> np.column_stack(([1,2,3],[4,5,6]))
array([[1, 4],
[2, 5],
[3, 6]])
>>> np.hstack(([1,2,3],[4,5,6]))
array([1, 2, 3, 4, 5, 6])
I've included hstack
for comparison as well. Notice how column_stack
stacks along the second dimension whereas vstack
stacks along the first dimension. The equivalent to column_stack
is the following hstack
command:
>>> np.hstack(([[1],[2],[3]],[[4],[5],[6]]))
array([[1, 4],
[2, 5],
[3, 6]])
I hope we can agree that column_stack
is more convenient.
In the Notes section to column_stack, it points out this:
This function is equivalent to
np.vstack(tup).T
.
There are many functions in numpy
that are convenient wrappers of other functions. For example, the Notes section of vstack says:
Equivalent to
np.concatenate(tup, axis=0)
if tup contains arrays that are at least 2-dimensional.
It looks like column_stack
is just a convenience function for vstack
.
hstack
stacks horizontally, vstack
stacks vertically:
The problem with hstack
is that when you append a column you need convert it from 1d-array to a 2d-column first, because 1d array is normally interpreted as a vector-row in 2d context in numpy:
a = np.ones(2) # 2d, shape = (2, 2)
b = np.array([0, 0]) # 1d, shape = (2,)
hstack((a, b)) -> dimensions mismatch error
So either hstack((a, b[:, None]))
or column_stack((a, b))
:
where None
serves as a shortcut for np.newaxis
.
If you're stacking two vectors, you've got three options:
As for the (undocumented) row_stack
, it is just a synonym of vstack
, as 1d array is ready to serve as a matrix row without extra work.
The case of 3D and above proved to be too huge to fit in the answer, so I've included it in the article called Numpy Illustrated.