Converting subarray index into original array index with Numpy

In this code snippet:

import numpy as np

def f(b, i):
  # calculate original index
  return j

a = np.random.rand((N, M))
b = a[:, m]
j = f(b, i)
assert b[i] == a[j]

I would like the function f to find an index, which satisfies the assertion at the last line. Indexing with j doesn't have to have a[j] syntax.


Solution 1:

Consider a sample array:

In [213]: a = np.arange(12).reshape(3,4)
In [214]: a
Out[214]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

And b from that:

In [215]: b = a[:,2]
In [216]: b
Out[216]: array([ 2,  6, 10])
In [217]: b[1]
Out[217]: 6

All the numpy knows about a is in:

In [218]: a.__array_interface__
Out[218]: 
{'data': (51087664, False),
 'strides': None,         # a.strides is (32,8)
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (3, 4),
 'version': 3}

b is a view of a, with the corresponding:

In [219]: b.__array_interface__
Out[219]: 
{'data': (51087680, False),
 'strides': (32,),
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (3,),
 'version': 3}

the base for b is the original arange:

In [221]: b.base
Out[221]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
In [222]: b.base.__array_interface__
Out[222]: 
{'data': (51087664, False),
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (12,),
 'version': 3}

Your f could compare the data attribute of b with its base to get the offset, where b "starts"

 (51087680-51087664)/8

So b starts at

In [223]: (51087680-51087664)/8
Out[223]: 2.0
In [224]: b[0]
Out[224]: 2
In [225]: b.base[2]
Out[225]: 2
In [226]: a.ravel()[2]
Out[226]: 2

We can guess/deduce that since the strides of b is (32,), and dtype is i8, that the other strides of a is 8.

b[1] will be 32/8 beyond its start, or 4.

In [227]: b[1]
Out[227]: 6
In [228]: b.base[2+4]
Out[228]: 6

If we also deduce that a shape is (3,4) (deduce the 4 for base length 12 and b shape of (3,)):

In [229]: np.unravel_index(6,(3,4))
Out[229]: (1, 2)
In [230]: a[1,2]
Out[230]: 6

I'll let you clean things up and decide for yourself whether these deductions and calculations are robust enough for your purposes.

alt a

If a is its own base (not a view of something else), b.base will be a itself, and we don't have to make deductions about its strides and shape:

In [231]: a = np.arange(12).reshape(3,4).copy()
In [232]: a.__array_interface__
Out[232]: 
{'data': (51778608, False),
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (3, 4),
 'version': 3}
In [233]: b = a[:,2]
In [234]: b.__array_interface__
Out[234]: 
{'data': (51778624, False),
 'strides': (32,),
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (3,),
 'version': 3}
In [235]: b.base
Out[235]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

There was similar question recently, and someone went to the work of packaging these calculations in a function. I don't have a link to that, but it shouldn't be hard to find. In any case, there isn't a simple numpy function call that will do this for you.

Here's the previous SO

given the index of an item in a view of a numpy array, find its index in the base array