Solution 1:

I'll try to give an explanation in simple terms. As others have pointed out, head normal form does not apply to Haskell, so I will not consider it here.

Normal form

An expression in normal form is fully evaluated, and no sub-expression could be evaluated any further (i.e. it contains no un-evaluated thunks).

These expressions are all in normal form:

42
(2, "hello")
\x -> (x + 1)

These expressions are not in normal form:

1 + 2                 -- we could evaluate this to 3
(\x -> x + 1) 2       -- we could apply the function
"he" ++ "llo"         -- we could apply the (++)
(1 + 1, 2 + 2)        -- we could evaluate 1 + 1 and 2 + 2

Weak head normal form

An expression in weak head normal form has been evaluated to the outermost data constructor or lambda abstraction (the head). Sub-expressions may or may not have been evaluated. Therefore, every normal form expression is also in weak head normal form, though the opposite does not hold in general.

To determine whether an expression is in weak head normal form, we only have to look at the outermost part of the expression. If it's a data constructor or a lambda, it's in weak head normal form. If it's a function application, it's not.

These expressions are in weak head normal form:

(1 + 1, 2 + 2)       -- the outermost part is the data constructor (,)
\x -> 2 + 2          -- the outermost part is a lambda abstraction
'h' : ("e" ++ "llo") -- the outermost part is the data constructor (:)

As mentioned, all the normal form expressions listed above are also in weak head normal form.

These expressions are not in weak head normal form:

1 + 2                -- the outermost part here is an application of (+)
(\x -> x + 1) 2      -- the outermost part is an application of (\x -> x + 1)
"he" ++ "llo"        -- the outermost part is an application of (++)

Stack overflows

Evaluating an expression to weak head normal form may require that other expressions be evaluated to WHNF first. For example, to evaluate 1 + (2 + 3) to WHNF, we first have to evaluate 2 + 3. If evaluating a single expression leads to too many of these nested evaluations, the result is a stack overflow.

This happens when you build up a large expression that does not produce any data constructors or lambdas until a large part of it has been evaluated. These are often caused by this kind of usage of foldl:

foldl (+) 0 [1, 2, 3, 4, 5, 6]
 = foldl (+) (0 + 1) [2, 3, 4, 5, 6]
 = foldl (+) ((0 + 1) + 2) [3, 4, 5, 6]
 = foldl (+) (((0 + 1) + 2) + 3) [4, 5, 6]
 = foldl (+) ((((0 + 1) + 2) + 3) + 4) [5, 6]
 = foldl (+) (((((0 + 1) + 2) + 3) + 4) + 5) [6]
 = foldl (+) ((((((0 + 1) + 2) + 3) + 4) + 5) + 6) []
 = (((((0 + 1) + 2) + 3) + 4) + 5) + 6
 = ((((1 + 2) + 3) + 4) + 5) + 6
 = (((3 + 3) + 4) + 5) + 6
 = ((6 + 4) + 5) + 6
 = (10 + 5) + 6
 = 15 + 6
 = 21

Notice how it has to go quite deep before it can get the expression into weak head normal form.

You may wonder, why does not Haskell reduce the inner expressions ahead of time? That is because of Haskell's laziness. Since it cannot be assumed in general that every subexpression will be needed, expressions are evaluated from the outside in.

(GHC has a strictness analyzer that will detect some situations where a subexpression is always needed and it can then evaluate it ahead of time. This is only an optimization, however, and you should not rely on it to save you from overflows).

This kind of expression, on the other hand, is completely safe:

data List a = Cons a (List a) | Nil
foldr Cons Nil [1, 2, 3, 4, 5, 6]
 = Cons 1 (foldr Cons Nil [2, 3, 4, 5, 6])  -- Cons is a constructor, stop. 

To avoid building these large expressions when we know all the subexpressions will have to be evaluated, we want to force the inner parts to be evaluated ahead of time.

seq

seq is a special function that is used to force expressions to be evaluated. Its semantics are that seq x y means that whenever y is evaluated to weak head normal form, x is also evaluated to weak head normal form.

It is among other places used in the definition of foldl', the strict variant of foldl.

foldl' f a []     = a
foldl' f a (x:xs) = let a' = f a x in a' `seq` foldl' f a' xs

Each iteration of foldl' forces the accumulator to WHNF. It therefore avoids building up a large expression, and it therefore avoids overflowing the stack.

foldl' (+) 0 [1, 2, 3, 4, 5, 6]
 = foldl' (+) 1 [2, 3, 4, 5, 6]
 = foldl' (+) 3 [3, 4, 5, 6]
 = foldl' (+) 6 [4, 5, 6]
 = foldl' (+) 10 [5, 6]
 = foldl' (+) 15 [6]
 = foldl' (+) 21 []
 = 21                           -- 21 is a data constructor, stop.

But as the example on HaskellWiki mentions, this does not save you in all cases, as the accumulator is only evaluated to WHNF. In the example below, the accumulator is a tuple, so it will only force evaluation of the tuple constructor, and not acc or len.

f (acc, len) x = (acc + x, len + 1)

foldl' f (0, 0) [1, 2, 3]
 = foldl' f (0 + 1, 0 + 1) [2, 3]
 = foldl' f ((0 + 1) + 2, (0 + 1) + 1) [3]
 = foldl' f (((0 + 1) + 2) + 3, ((0 + 1) + 1) + 1) []
 = (((0 + 1) + 2) + 3, ((0 + 1) + 1) + 1)  -- tuple constructor, stop.

To avoid this, we must make it so that evaluating the tuple constructor forces evaluation of acc and len. We do this by using seq.

f' (acc, len) x = let acc' = acc + x
                      len' = len + 1
                  in  acc' `seq` len' `seq` (acc', len')

foldl' f' (0, 0) [1, 2, 3]
 = foldl' f' (1, 1) [2, 3]
 = foldl' f' (3, 2) [3]
 = foldl' f' (6, 3) []
 = (6, 3)                    -- tuple constructor, stop.

Solution 2:

The section on Thunks and Weak Head Normal Form in the Haskell Wikibooks description of laziness provides a very good description of WHNF along with this helpful depiction:

Evaluating the value (4, [1, 2]) step by step. The first stage is completely unevaluated; all subsequent forms are in WHNF, and the last one is also in normal form.

Evaluating the value (4, [1, 2]) step by step. The first stage is completely unevaluated; all subsequent forms are in WHNF, and the last one is also in normal form.

Solution 3:

Haskell programs are expressions and they are run by performing evaluation.

To evaluate an expression, replace all function applications by their definitions. The order in which you do this does not matter much, but it's still important: start with the outermost application and proceed from left to right; this is called lazy evaluation.

Example:

   take 1 (1:2:3:[])
=> { apply take }
   1 : take (1-1) (2:3:[])
=> { apply (-)  }
   1 : take 0 (2:3:[])
=> { apply take }
   1 : []

Evaluation stops when there are no more function applications left to replace. The result is in normal form (or reduced normal form, RNF). No matter in which order you evaluate an expression, you will always end up with the same normal form (but only if the evaluation terminates).

There is a slightly different description for lazy evaluation. Namely, it says that you should evaluate everything to weak head normal form only. There are precisely three cases for an expression to be in WHNF:

  • A constructor: constructor expression_1 expression_2 ...
  • A built-in function with too few arguments, like (+) 2 or sqrt
  • A lambda-expression: \x -> expression

In other words, the head of the expression (i.e. the outermost function application) cannot be evaluated any further, but the function argument may contain unevaluated expressions.

Examples of WHNF:

3 : take 2 [2,3,4]   -- outermost function is a constructor (:)
(3+1) : [4..]        -- ditto
\x -> 4+5            -- lambda expression

Notes

  1. The "head" in WHNF does not refer to the head of a list, but to the outermost function application.
  2. Sometimes, people call unevaluated expressions "thunks", but I don't think that's a good way to understand it.
  3. Head normal form (HNF) is irrelevant for Haskell. It differs from WHNF in that the bodies of lambda expressions are also evaluated to some extent.

Solution 4:

A good explanation with examples is given at http://foldoc.org/Weak+Head+Normal+Form Head normal form simplifies even the bits of an expression inside of a function abstraction, while "weak" head normal form stops at function abstractions.

From the source, if you have:

\ x -> ((\ y -> y+x) 2)

that is in weak head normal form, but not head normal form... because the possible application is stuck inside of a function that can't be evaluated yet.

Actual head normal form would be difficult to implement efficiently. It would require poking around inside of functions. So the advantage of weak head normal form is that you can still implement functions as an opaque type, and hence it's more compatible with compiled languages and optimization.

Solution 5:

The WHNF does not want the body of lambdas to be evaluated, so

WHNF = \a -> thunk
HNF = \a -> a + c

seq wants its first argument to be in WHNF, so

let a = \b c d e -> (\f -> b + c + d + e + f) b
    b = a 2
in seq b (b 5)

evaluates to

\d e -> (\f -> 2 + 5 + d + e + f) 2

instead of, what would be using HNF

\d e -> 2 + 5 + d + e + 2