python - form new list from elements between values (moving indexes)

Consider a list of strings:

haystack = ['hay', 'hay', 'needle', 'a', 'b', 'c', 'd', 
            'needle', 'hay', 'stuff', 'straw', 'hay', 'needle', 
            'x', 'y', 'needle']

I want to extract the values between occurrences of 'needle' then move on to the next pair of 'needle' occurrences. Desired output:

['a', 'b', 'c', 'x', 'y']

Note that their are four values that should not be added, even though they are between two 'needle' values.

I have tried to modify this excellent answer (python - form new list from n elements to the right of a reoccurring word):

idx = [idx for idx, val in enumerate(haystack) if val=="needle"]
[val for i in idx for val in haystack[i: idx[i+1]]]

But this returned:

TypeError: List index out of range.

Question

Why would this happen and what could I try to return all values between two 'needle' values as seen above?


You're getting the TypeError because you're iterating over idx which contains indices of haystack (a much longer list) and index idx with it, i.e. trying to index an element in idx that isn't there.

You can enumerate over idx[:-1] (to make sure you don't get "List index out of range." error) instead and slice haystack between an index in idx and the next index in idx:

[val for i_idx, i in enumerate(idx[:-1]) for val in haystack[i+1: idx[i_idx+1]] if val != 'hay']

or you can use zip to traverse two indices in idx together and slice elements in haystack between those two indices:

[val for i,j in zip(idx, idx[1:]) for val in haystack[i+1: j] if val != 'hay']

If you need to take a pair at a time, then you can take odd and even indices of idx at a time.

[val for i,j in zip(idx[::2], idx[1::2]) for val in haystack[i+1: j]]

Output:

['a', 'b', 'c', 'd', 'x', 'y']

First of all, idx[i+1] will not return the actual index, and will get out of range, that's why you get the error.

You should better iterate over idx array two by two, so that you extract the indices that you need:

[haystack[idx[i]+1:idx[i+1]] for i in range(0,len(idx),2)]

Output:

[['a', 'b', 'c', 'd'], ['x', 'y']]

Please use below logic

haystack = ['hay','hay','needle','a','b','c','d','needle','hay','hay','hay','hay','needle','x','y','needle']

def get_between_data(haystack):
    need_stack = []
    result = []
    for ele in haystack:
        if ele == 'needle' and need_stack:
            need_stack = []
        elif not need_stack and ele == 'needle':
            need_stack = ['needle']
        elif need_stack:
            result.append(ele)
print(get_between_data(haystack))

Output: ['a', 'b', 'c', 'd', 'x', 'y']


Below should give you the expected result and will also handle an odd number of 'needle' present in the input list.

haystack = ['hay','hay','needle','a','b','c','d','needle','hay','stuff','straw','hay','needle','x', 'y','needle']
final=[]
indices = [index for index, element in enumerate(haystack) if element == 'needle']
for i,j in zip(indices[::2],indices[1::2]):
    final.extend(haystack[i+1:j])

output:
['a', 'b', 'c', 'd', 'x', 'y']


Perhaps as an alternative, we could look at a functional approach using reduce to iterate through the list and accumulate the values you're looking for.

from functools import reduce

def vals_between(lst, tok):

    def f(x, y):
        flag = x[0]
        subgroup = x[1]
        results = x[2]  

        # We haven't yet encountered the token, and the current
        # element isn't the token we want. No action is taken.
        # Subgroup should be empty.
        if not flag and y != tok:
            return (False, [], results)
        # We haven't previously encountered the token, but we 
        # have now! No accumulators are changed, but the flag 
        # flips to True.
        elif not flag and y == tok:
            return (True, [], results)
        # We have previously encountered the token we want, and
        # the current element is not the same token.
        # Add the current element to the subgroup.
        elif flag and y != tok:
            return (True, subgroup + [y], results)
        # We have encountered the token we're looking for, and 
        # this is the "closing" token. Set flag to False and 
        # add the subgroup to results.
        elif flag and y == tok:
            return (False, [], results + subgroup)

    return reduce(f, lst, (False, [], []))[-1]

If we run this on haystack with token set to 'needle':

>>> vals_between(haystack, tok='needle')
['a', 'b', 'c', 'd', 'x', 'y']
>>>

haystack is ['hay','hay','needle','a','b','c','d','needle','hay','hay','hay','hay','needle','x','y','needle']. As we iterate through with reduce, our initial state is (False, [], []).

When we evaluate f the first time, it's:

f((False, [], []), 'hay')

Now flag is False and 'hay' does not equal 'needle', so f returns (False, [], []). We check 'hay' again and get the same result. Now, when we get to 'needle':

f((False, [], []), 'needle')

Now the flag is False but we have found the token we're looking for, so f returns:

(True, [], [])

So when we evaluate:

f((True, [], []), 'a')

The behavior is a little different because this time the flag is True. This time f returns:

(True, ['a'], [])

This continues until we get to:

f((True, ['a', 'b', 'c', 'd'], []), 'needle')

We have a True flag and we've hit a token we're looking for. Not only does the flag get set to False but the current subgroup is emptied and tacked onto the results, giving us:

(False, [], ['a', 'b', 'c', 'd'])

Eventually we return the last element in this tuple and have the list of elements between instances of the token we're looking for.


With a small modification, the function could be more flexible, taking a predicate function to identify tokens rather than a specific value.

def vals_between_pred(lst, pred):

    def f(x, y):
        flag = x[0]
        subgroup = x[1]
        results = x[2]  
        if not flag and not pred(y):
            return (False, [], results)
        elif not flag and pred(y):
            return (True, [], results)
        elif flag and not pred(y):
            return (True, subgroup + [y], results)
        elif flag and pred(y):
            return (False, [], results + subgroup)

    return reduce(f, lst, (False, [], []))[-1]

At which point we could look for elements between 'needle' in a case-insensitive way very simply:

>>> vals_between_pred(haystack, pred=lambda x: x.upper() == 'NEEDLE')
['a', 'b', 'c', 'd', 'x', 'y']
>>>