Is there a better way than a while loop to perform this function?

I was attempting some python exercises and I hit the 5s timeout on one of the tests. The function is pre-populated with the parameters and I am tasked with writing code that is fast enough to run within the max timeframe of 5s.

There are N dishes in a row on a kaiten belt, with the ith dish being of type Di​. Some dishes may be of the same type as one another. The N dishes will arrive in front of you, one after another in order, and for each one you'll eat it as long as it isn't the same type as any of the previous K dishes you've eaten. You eat very fast, so you can consume a dish before the next one gets to you. Any dishes you choose not to eat as they pass will be eaten by others. Determine how many dishes you'll end up eating.

Issue

The code "works" but is not fast enough.

Code

The idea here is to add the D[i] entry if it is not in the pastDishes list (which can be of size K).

from typing import List
# Write any import statements here

def getMaximumEatenDishCount(N: int, D: List[int], K: int) -> int:
  # Write your code here
  numDishes=0
  pastDishes=[]
  i=0
  while (i<N):
    if(D[i] not in pastDishes):
      numDishes+=1
      pastDishes.append(D[i])
      if len(pastDishes)>K:
        pastDishes.pop(0)
    i+=1
  return numDishes  

Is there a more effective way?


Solution 1:

After much trial and error, I have finally found a solution that is fast enough to pass the final case in the puzzle you are working on. My previous code was very neat and quick, however, I have finally found a module with a tool that makes this much faster. Its from collections just as deque is, however it is called Counter.

This was my original code:

def getMaximumEatenDishCount(N: int, D: list, K: int) -> int:
    numDishes=lastMod=0
    pastDishes=[0]*K
    for Dval in D:
        if Dval in pastDishes:continue
        pastDishes[lastMod] = Dval
        numDishes,lastMod = numDishes+1,(lastMod+1)%K
    return numDishes   

I then implemented Counter like so:

from typing import List

# Write any import statements here
from collections import Counter

def getMaximumEatenDishCount(N: int, D: 'list[int]', K: int) -> int:
    eatCount=lastMod = 0
    pastDishes=[0]*K

    eatenCounts = Counter({0:K})
    for Dval in D:
        if Dval in eatenCounts:continue
        eatCount +=1
        eatenCounts[Dval] +=1

        val = pastDishes[lastMod]
        if eatenCounts[val] <= 1:   eatenCounts.pop(val)
        else:                       eatenCounts[val] -= 1
        
        pastDishes[lastMod]=Dval
        lastMod = (lastMod+1)%K
    return eatCount

Which ended up working quite well. I'm sure you can make it less clunky, but this should work fine on its own.

Some Explanation of what i am doing:

Typically while loops are actually marginally faster than a for loop, however since I need to access the value at an index multiple times if i used it, using a for loop I believe is actually better in this situation. You can see i also initialised the list to the max size it needs to be and am writing over the values instead of popping+appending which saves alot of time. Additionally, as pointed out by @outis, another small improvement was made in my code by using the modulo operator in conjunction with the variable which removes the need for an additional if statement. The Counter is essentially a special dict object that holds a hashable as the key and an int as the value. I use the fact that lastMod is an index to what would normally be accesed through list.pop(0) to access the object needed to either remove or decrement in the counter

Note that it is not considered 'pythonic' to assign multiple variable on one line, however I believe it adds a slight performance boost which is why I have done it. This can be argued though, see this post.

If anyone else is interested the problem that we were trying to solve, it can be found here: https://www.facebookrecruiting.com/portal/coding_puzzles/?puzzle=958513514962507

Solution 2:

Can we use an appropriate data structure? If so:

Data structures

Seems like an ordered set which you have to shrink to a capacity restriction of K.

To meet that, if exceeds (len(ordered_set) > K) we have to remove the first n items where n = len(ordered_set) - K. Ideally the removal will perform in O(1).

However since removal on a set is in unordered fashion. We first transform it to a list. A list containing unique elements in the order of appearance in their original sequence.

From that ordered list we can then remove the first n elements.

For example: the function lru returns the least-recently-used items for a sequence seq limited by capacity-limit k.

To obtain the length we can simply call len() on that LRU return value:

maximumEatenDishCount = len(lru(seq, k))

See also:

  • Does Python have an ordered set?
  • Fastest way to get sorted unique list in python?

Using set for uniqueness (up to Python 3.6)

def lru(seq, k):
    return list(set(seq))[:k]

Using dict for uniqueness (since Python 3.6)

Same mechanics as above, but using the preserved insertion order of dicts since 3.7:

  • using OrderedDict explicitly
from collections import OrderedDict

def lru(seq, k):
    return list(OrderedDict.fromkeys(seq).keys())[:k]
  • using dict factory-method:
def lru(seq, k):
    return list(dict.fromkeys(seq).keys())[:k]
  • using dict-comprehension:
def lru(seq, k):
    return list({i:0 for i in seq}.keys())[:k]

See also:

  • The order of keys in dictionaries
  • Using ordered dictionary as ordered set
  • How do you remove duplicates from a list whilst preserving order?
  • Real Python: OrderedDict vs dict in Python: The Right Tool for the Job