Pseudo-quicksort time complexity

Solution 1:

This "quicksort" is actually deforested tree sort: http://www.reddit.com/r/programming/comments/2h0j2/real_quicksort_in_haskell

data Tree a = Leaf | Node a (Tree a) (Tree a)

mkTree [] = Leaf
mkTree (x:xs) = Node x (mkTree (filter (<= x) xs)) (mkTree (filter (x <) xs))

Binary tree is unbalanced, so O(N^2) worst-case and O(N*Log N) average-case complexity for building search tree.

foldTree f g Leaf = g
foldTree f g (Node x l r) = f x (foldTree f g l) (foldTree f g r)

treeSort l = foldTree (\x lft rht -> lft++[x]++rht) [] (mkTree l)

Retrieval algorithm have O(N^2) worst-case and O(N*Log N) average-case complexity.

Well-balanced:

Prelude> let rnds = iterate step where step x = (75*x) `mod` 65537
Prelude> length . quicksort . take 4000 . rnds $ 1
4000
(0.08 secs, 10859016 bytes)
Prelude> length . quicksort . take 8000 . rnds $ 1
8000
(0.12 secs, 21183208 bytes)
Prelude> length . quicksort . take 16000 . rnds $ 1
16000
(0.25 secs, 42322744 bytes)

Not-so-well-balanced:

Prelude> length . quicksort . map (`mod` 10) $ [1..4000]
4000
(0.62 secs, 65024528 bytes)
Prelude> length . quicksort . map (`mod` 10) $ [1..8000]
8000
(2.45 secs, 241906856 bytes)
Prelude> length . quicksort . map (`mod` 10) $ [1..16000]
16000
(9.52 secs, 941667704 bytes)

Solution 2:

I agree with your assumption that the average time complexity still is O(n log n). I'm not an expert and 100% sure, but these are my thoughts:

This is a pseudo code of the in-place quicksort: (call quicksort with l=1 and r=length of the array)

Quicksort(l,r)  
--------------
IF r-l>=1 THEN  
    choose pivot element x of {x_l,x_l+1,...,x_r-1,x_r}   
    order the array-segment x_l,...x_r in such a way that  
        all elements < x are on the left side of x // line 6  
        all elements > x are on the right side of x // line 7  
    let m be the position of x in the 'sorted' array (as said in the two lines above)  
    Quicksort(l,m-1);  
    Quicksort(m+1,r)  
FI  

The average time complexity analysis then reasons by selecting the "<"-comparisons in line 6 and 7 as the dominant operation in this algorithm and finally comes to the conclusion that the average time complexity is O(n log n). As the cost of line "order the array-segment x_l,...x_r in such a way that..." are not considered (only the dominant operation is important in time complexity analysis if you want to find bounds), I think "because it has to do two passes of the list when it partitions it" is not a problem, also as your Haskell version would just take approximately twice as long in this step. The same holds true for the appendix-operation and I agree with on that this adds nothing to the asymptotic costs:

Because the appends and the partition still have linear time complexity, even if they are inefficient.

For the sake of convenience lets assume that this adds up "n" to our time complexity costs, so that we have "O(n log n+n)". As there exists a natural number o for that n log n > n for all natural numbers greater than o holds true, you can estimate n log n +n to the top by 2 n log n and to the bottom by n log n, therefore n log n+n = O(n log n).

Further, the choice of the first element as the pivot is not the best choice.

I think the choice of the pivot element is irrelevant here, because in the average case analysis you assume uniform distribution of the elements in the array. You can't know from which place in the array you should select it, and you therefore have to consider all these cases in which your pivot-element (independently from which place of the list you take it) is the i-st smallest element of your list, for i=1...r.