How does __contains__ work for ndarrays?
>>> x = numpy.array([[1, 2],
... [3, 4],
... [5, 6]])
>>> [1, 7] in x
True
>>> [1, 2] in x
True
>>> [1, 6] in x
True
>>> [2, 6] in x
True
>>> [3, 6] in x
True
>>> [2, 3] in x
False
>>> [2, 1] in x
False
>>> [1, 2, 3] in x
False
>>> [1, 3, 5] in x
False
I have no idea how __contains__
works for ndarrays. I couldn't find the relevant documentation when I looked for it. How does it work? And is it documented anywhere?
I found the source for ndarray.__contains__
, in numpy/core/src/multiarray/sequence.c
. As a comment in the source states,
thing in x
is equivalent to
(x == thing).any()
for an ndarray x
, regardless of the dimensions of x
and thing
. This only makes sense when thing
is a scalar; the results of broadcasting when thing
isn't a scalar cause the weird results I observed, as well as oddities like array([1, 2, 3]) in array(1)
that I didn't think to try. The exact source is
static int
array_contains(PyArrayObject *self, PyObject *el)
{
/* equivalent to (self == el).any() */
int ret;
PyObject *res, *any;
res = PyArray_EnsureAnyArray(PyObject_RichCompare((PyObject *)self,
el, Py_EQ));
if (res == NULL) {
return -1;
}
any = PyArray_Any((PyArrayObject *)res, NPY_MAXDIMS, NULL);
Py_DECREF(res);
ret = PyObject_IsTrue(any);
Py_DECREF(any);
return ret;
}
Seems like numpy
's __contains__
is doing something like this for a 2-d case:
def __contains__(self, item):
for row in self:
if any(item_value == row_value for item_value, row_value in zip(item, row)):
return True
return False
[1,7]
works because the 0
th element of the first row matches the 0
th element of [1,7]
. Same with [1,2]
etc. With [2,6]
, the 6 matches the 6 in the last row. With [2,3]
, none of the elements match a row at the same index. [1, 2, 3]
is trivial since the shapes don't match.
See this for more, and also this ticket.