I have this code:

import Data.List

newList_bad  lst = foldl' (\acc x -> acc ++ [x*2]) [] lst
newList_good lst = foldl' (\acc x -> x*2 : acc) [] lst

These functions return lists with each element multiplied by 2:

*Main> newList_bad [1..10]
[2,4,6,8,10,12,14,16,18,20]
*Main> newList_good [1..10]
[20,18,16,14,12,10,8,6,4,2]

In ghci:

*Main> sum $ newList_bad [1..15000]
225015000
(5.24 secs, 4767099960 bytes)
*Main> sum $ newList_good [1..15000]
225015000
(0.03 secs, 3190716 bytes)

Why newList_bad function works 200 times slower than newList_good? I understand that it's not a good solution for that task. But why this innocent code works so slow?

What is this "4767099960 bytes"?? For that simple an operation Haskell used 4 GiB??

After compilation:

C:\1>ghc -O --make test.hs
C:\1>test.exe
225015000
Time for sum (newList_bad [1..15000]) is 4.445889s
225015000
Time for sum (newList_good [1..15000]) is 0.0025005s

There is much confusion about this issue. The usual reason given is that "repeatedly appending at end of list requires repeated traversals of list and is thus O(n^2)". But it would only be so simple under strict evaluation. Under lazy evaluation everything is supposed to be delayed, so it begs the question whether there actually are these repeated traversals and appendings at all. The adding at end is triggered by consuming at front, and since we consume at front the list is getting shorter, so what is the exact timing of these actions is unclear. So the real answer is more subtle, and deals with specific reduction steps under lazy evaluation.

The immediate culprit is that foldl' only forces its accumulator argument to weak head normal form - i.e. until a non-strict constructor is exposed. The functions involved here are

(a:b)++c = a:(b++c)    -- does nothing with 'b', only pulls 'a' up
[]++c = c              -- so '++' only forces 1st elt from its left arg

foldl' f z [] = z
foldl' f z (x:xs) = let w=f z x in w `seq` foldl' f w xs

sum xs = sum_ xs 0     -- forces elts fom its arg one by one
sum_ [] a = a
sum_ (x:xs) a = sum_ xs (a+x)

and so actual reduction sequence is (with g = foldl' f)

sum $ foldl' (\acc x-> acc++[x^2]) []          [a,b,c,d,e]
sum $ g  []                                    [a,b,c,d,e]
      g  [a^2]                                   [b,c,d,e]
      g  (a^2:([]++[b^2]))                         [c,d,e]
      g  (a^2:(([]++[b^2])++[c^2]))                  [d,e]
      g  (a^2:((([]++[b^2])++[c^2])++[d^2]))           [e]
      g  (a^2:(((([]++[b^2])++[c^2])++[d^2])++[e^2]))   []
sum $ (a^2:(((([]++[b^2])++[c^2])++[d^2])++[e^2]))

Notice we've only performed O(n) steps so far. a^2 is immediately available here for the sum's consumption, but b^2 is not. We're left here with the left-nested structure of ++ expressions. The rest is best explained in this answer by Daniel Fischer. The gist of it is that to get b^2 out, O(n-1) steps will have to be performed - and the structure left in the wake of this access will still be left-nested, so the next access will take O(n-2) steps, and so on - the classic O(n^2) behavior. So the real reason is that ++ isn't forcing or rearranging its arguments enough to be efficient.

This is actually counter-intuitive. We could expect the lazy evaluation to magically "do it" for us here. After all we're only expressing out intent to add [x^2] to the end of list in the future, we don't actually do it right away. So the timing here is off, but it could be made right - as we access the list, new elements would be added into it and consumed right away, if the timing were right: if c^2 would be added into the list after b^2 (space-wise), say, just before (in time) b^2 would be consumed, the traversal/access would always be O(1).

This is achieved with so-called "difference-list" technique:

newlist_dl lst = foldl' (\z x-> (z . (x^2 :)) ) id lst

which, if you think of it for a moment, looks exactly the same as your ++[x^2] version. It expresses the same intent, and leaves left-nested structure too.

The difference, as explained in that same answer by Daniel Fischer, is that a (.) chain, when first forced, rearranges itself into a right-nested ($) structure1 in O(n) steps, after which each access is O(1) and the timing of appendings is optimal exactly as described in the above paragraph, so we're left with overall O(n) behaviour.


1 which is kind of magical, but it does happen. :)


Classic list behavior.

Recall:

(:)  -- O(1) complexity
(++) -- O(n) complexity

So you are creating an O(n^2) algo, instead of an O(n) one.

For this common case of appending to lists incrementally, try using a dlist, or just reverse at the end.


To complement the other answers with a bit of larger perspective: with lazy lists, using foldl' in a function that returns a list is usually a bad idea. foldl' is often useful when you are reducing a list to a strict (non-lazy) scalar value (e.g., summing a list). But when you're building a list as the result, foldr is usually better, because of laziness; the : constructor is lazy, so the list's tail isn't computed until it's actually needed.

In your case:

newList_foldr lst = foldr (\x acc -> x*2 : acc) [] lst

This is actually the same as map (*2):

newList_foldr lst = map (*2) lst
map f lst = foldr (\x acc -> f x : acc) [] lst

Evaluation (using the first, map-less definition):

newList_foldr [1..10] 
  = foldr (\x acc -> x*2 : acc) [] [1..10]
  = foldr (\x acc -> x*2 : acc) [] (1:[2..10])
  = 1*2 : foldr (\x rest -> f x : acc) [] [2..10]

This is about as far Haskell will evaluate when newList [1..10] is forced. It only evaluates any further if the consumer of this result demands it—and only as little as needed to satisfy the consumer. So for example:

firstElem [] = Nothing
firstElem (x:_) = Just x

firstElem (newList_foldr [1..10])
  -- firstElem only needs to evaluate newList [1..10] enough to determine 
  -- which of its subcases applies—empty list or pair.
  = firstElem (foldr (\x acc -> x*2 : acc) [] [1..10])
  = firstElem (foldr (\x acc -> x*2 : acc) [] (1:[2..10]))
  = firstElem (1*2 : foldr (\x rest -> f x : acc) [] [2..10])
  -- firstElem doesn't need the tail, so it's never computed!
  = Just (1*2)

This also means that the foldr-based newList can also work with infinite lists:

newList_foldr [1..] = [2,4..]
firstElem (newList_foldr [1..]) = 2

If you use foldl', on the other hand, you must always compute the whole lists, which also means that you can't work on infinite lists:

firstElem (newList_good [1..])    -- doesn't terminate

firstElem (newList_good [1..10])
  = firstElem (foldl' (\acc x -> x*2 : acc) [] [1..10])
  = firstElem (foldl' (\acc x -> x*2 : acc) [] (1:[2..10]))
  = firstElem (foldl' (\acc x -> x*2 : acc) [2] [2..10])
  -- we can't short circuit here because the [2] is "inside" the foldl', so 
  -- firstElem can't see it
  = firstElem (foldl' (\acc x -> x*2 : acc) [2] (2:[3..10]))
  = firstElem (foldl' (\acc x -> x*2 : acc) [4,2] [3..10])
    ...
  = firstElem (foldl' (\acc x -> x*2 : acc) [18,16,14,12,10,8,6,4,2] (10:[]))
  = firstElem (foldl' (\acc x -> x*2 : acc) [20,18,16,14,12,10,8,6,4,2] [])
  = firstElem [20,18,16,14,12,10,8,6,4,2]
  = firstElem (20:[18,16,14,12,10,8,6,4,2])
  = Just 20

The foldr-based algorithm took 4 steps to compute firstElem_foldr (newList [1..10]), whereas the foldl'-based one took in the order of 21 steps. What's worse is that the 4 steps is a constant cost, whereas the 21 is proportional to the length of the input list—firstElem (newList_good [1..150000]) takes 300,001 steps, while firstElem (newList_foldr [1..150000] takes 5 steps, as does firstElem (newList_foldr [1..] for that matter.

Note also that firstElem (newList_foldr [1.10]) runs in constant space as well as constant time (it has to; you need more than constant time to allocate more than constant space). The pro-foldl truism from strict languages—"foldl is tail recursive and runs in constant space, foldr is not tail recursive and runs in linear space or worse"—is not true in Haskell.