Numpy: views vs copy by slicing
Solution 1:
The accepted answer by John Zwinck is actually false (I just figured this out the hard way!). The problem in the question is a combination of doing "l-value indexing" with numpy's fancy indexing. The following doc explains exactly this case
https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html
in the section "But fancy indexing does seem to return views sometimes, doesn't it?"
Edit:
To summarize the above link:
Whether a view or a copy is created is determined by whether the indexing can be represented as a slice.
Exception: If one does "fancy indexing" then always a copy is created. Fancy indexing is something like a[[1,2]].
Exception to the exception: If one does l-value indexing (i.e. the indexing happens left of the =
sign), then the rule for when a view or a copy are created doesn't apply anymore (though see below for a further exception). The python interpreter will directly assign the values to the left hand side without creating a copy or a view.
To prove that a copy is created in both cases, you can do the operation in two steps:
>>> a = np.arange(12).reshape(3, 4)
>>> b = a[0:3:2, :][:, [0, 2]]
>>> b[:] = 100
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
and
>>> b = a[:, [0, 2]][0:3:2, :]
>>> b[:] = 0
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Just as an aside, the question by the original poster is the exact problem stated at the end of the scipy-cookbook link above. There is no solution given in the book. The tricky thing about the question is that there are two indexing operations done in a row.
Exception to the exception to the exception: If there are two indexing operations done in a row on the left hand side (as is the case in this question), the direct assignment in l-value indexing only works if the first indexing operation can be represented as a slice. Otherwise a copy has to be created even though it is l-value indexing.
Solution 2:
All that matters is whether you slice by rows or by columns. Slicing by rows can return a view because it is a contiguous segment of the original array. Slicing by column must return a copy because it is not a contiguous segment. For example:
A1 A2 A3
B1 B2 B3
C1 C2 C3
By default, it is stored in memory this way:
A1 A2 A3 B1 B2 B3 C1 C2 C3
So if you want to choose every second row, it is:
[A1 A2 A3] B1 B2 B3 [C1 C2 C3]
That can be described as {start: 0, size: 3, stride: 6}
.
But if you want to choose every second column:
[A1] A2 [A3 B1] B2 [B3 C1] C2 [C3]
And there is no way to describe that using a single start, size, and stride. So there is no way to construct such a view.
If you want to be able to view every second column instead of every second row, you can construct your array in column-major aka Fortran order instead:
np.array(a, order='F')
Then it will be stored as such:
A1 B1 C1 A2 B2 C2 A3 B3 C3
Solution 3:
This is my understanding, for your reference
a[0:3:2, :] # basic indexing, a view
... = a[0:3:2, :][:, [0, 2]] # getitme version, a copy,
# because you use advanced
# indexing [:,[0,2]]
a[0:3:2, :][:, [0, 2]] = ... # howver setitem version is
# like a view, setitem version
# is different from getitem version,
# this is not c++
a[:, [0, 2]] # getitem version, a copy,
# because you use advanced indexing
a[:, [0, 2]][0:3:2, :] = 0 # the copy is modied,
# but a keeps unchanged.
If I have any misunderstanding, please point it out.