Concern about space complexity of quick sort
def quick_sort(array):
if len(array) <=1:
return array
pivot = array[-1]
array.pop()
less = []
greater = []
for num in array:
if num > pivot:
greater.append(num)
else:
less.append(num)
return quick_sort(less) + [pivot] + quick_sort(greater)
What's the space complexity of this implementation of quicksort? I just picked the last element as the pivot, created an array of the elements greater and the elements lesser and moved them accordingly. Then I recursively did that for both the lesser and greater arrays. So at the end, I'd have [pivot] + [pivot] + [pivot]... all in sorted order. Now I'm kind of confused about the space complexity. I have two sub arrays for the lesser and greater and also there's the recursion call stack. What do you think?
The space complexity of your implementation of quicksort is Θ(n2) in the worst case and Θ(n) on average.
Here’s how to see this. Imagine drawing out the full recursion tree for your algorithm. At any one point in time, the algorithm is in one of those recursive calls, with space needed to store all the data from that recursive call, plus all the space for the recursive calls above it. That’s because the call stack, at any one point in time, is a path from some call back up to the root call. Therefore, the space complexity is the maximum amount of space used on any path from a leaf in the recursion tree back up to the root.
Imagine, then, that you happen to pick the absolute worst pivot possible at each step - say, you always pick the smallest or largest element. Then your recursion tree is essentially a giant linked list, where the root holds an array of length n, under that is an array of length n-1, under that is an array of length n-2, etc. until you’re down to an array of length one. The space usage then is 1+2+3+...+n, which is Θ(n2). That’s not great.
On the other hand, suppose you’re looking at a more “typical” run of quicksort, in which you generally get good pivots. In that case, you’d expect that, about half the time, you get a pivot in the middle 50% of the array. With a little math, you can show that this means that, on expectation, you’ll have about two splits before the array size drops to 75% of its previous size. That makes the depth of the recursion tree O(log n). You’ll then have about two layers with arrays of size roughly n, about two layers with arrays of size roughly .75n, about two layers of size roughly (.75)2n, etc. That makes your space usage roughly
2(n + .75n + (.75)2n + ...)
= 2n(1 + .75 + (.75)2 + ...)
= Θ(n).
That last step follows because that’s the sum of a geometric series, which converges to some constant.
To improve your space usage, you’ll need to avoid creating new arrays at each level for your lesser and greater elements. Consider using an in-place partition algorithm to modify the array in place. If you’re clever, you can use that approach and end up with O(log n) total space usage.
Hope this helps!