Functional Programming: what is an "improper list"?

Solution 1:

I think @Vijay's answer is the best one so far and I just intend to Erlangify it.

Pairs (cons cells) in Erlang are written as [Head|Tail] and nil is written as []. There is no restriction as to what the head and tail are but if you use the tail to chain more cons cells you get a list. If the final tail is [] then you get a proper list. There is special syntactic support for lists in that the proper list

[1|[2|[3|[]]]]

is written as

[1,2,3]

and the improper list

[1|[2|[3|4]]]

is written as

[1,2,3|4]

so you can see the difference. Matching against proper/improper lists is correspondingly easy. So a length function len for proper lists:

len([_|T]) -> 1 + len(T);
len([]) -> 0.

where we explicitly match for the terminating []. If given an improper list this will generate an error. While the function last_tail which returns the last tail of a list can handle improper lists as well:

last_tail([_|T]) -> last_tail(T);
last_tail(Tail) -> Tail.                 %Will match any tail

Note that building a list, or matching against it, as you normally do with [Head|Tail] does not check if the tail is list so there is no problem in handling improper lists. There is seldom a need for improper lists, though you can do cool things with them.

Solution 2:

I think it's easier to explain this using Scheme.

A list is a chain of pairs that end with an empty list. In other words, a list ends with a pair whose cdr is ()

(a . (b . (c . (d . (e . ()))))) 

;; same as

(a b c d e)

A chain of pairs that doesn't end in the empty list is called an improper list. Note that an improper list is not a list. The list and dotted notations can be combined to represent improper lists, as the following equivalent notations show:

 (a b c . d)
 (a . (b . (c . d)))

An example of a usual mistake that leads to the construction of an improper list is:

scheme> (cons 1 (cons 2 3))
(1 2 . 3)

Notice the dot in (1 2 . 3)---that's like the dot in (2 . 3), saying that the cdr of a pair points to 3, not another pair or '(). That is, it's an improper list, not just a list of pairs. It doesn't fit the recursive definition of a list, because when we get to the second pair, its cdr isn't a list--it's an integer.

Scheme printed out the first part of the list as though it were a normal cdr-linked list, but when it got to the end, it couldn't do that, so it used "dot notation."

You generally shouldn't need to worry about dot notation, because you should use normal lists, not improper list. But if you see an unexpected dot when Scheme prints out a data structure, it's a good guess that you used cons and gave it a non-list as its second argument--something besides another pair or ().

Scheme provides a handy procedure that creates proper lists, called list. list can take any number of arguments, and constructs a proper list with those elements in that order. You don't have to remember to supply the empty list---list automatically terminates the list that way.

Scheme>(list 1 2 3 4)
(1 2 3 4)

Courtesy: An Introduction to Scheme

Solution 3:

The definition of a list in Erlang is given in the manual - specifically Section 2.10

In Erlang the only thing you really need to know about improper lists is how to avoid them, and the way to do that is very simple - it is all down to the first 'thing' that you are going to build your list on. The following all create proper lists:

A = [].
B = [term()].
C = [term(), term(), term()].

In all these cases the syntax ensures that there is a hidden 'empty' tail which matches to '[]' sort of at the end....

So from them the following operations all produce a proper list:

X = [term() | A].
Y = [term() | B].
Z = [term() | C]. 

They are all operations which add a new head to a proper list.

What makes is useful is that you can feed each of X, Y or Z into a function like:

func([], Acc)      -> Acc;
func([H | T], Acc) -> NewAcc = do_something(H),
                      func(T, [NewAcc | Acc]).

And they will rip through the list and terminate on the top clause when the hidden empty list at the tail is all that is left.

The problem comes when your base list has been improperly made, like so:

D = [term1() | term2()]. % term2() is any term except a list

This list doesn't have the hidden empty list as the terminal tail, it has a term...

From here on downwards is mince as Robert Virding pointed out in the comments

So how do you write a terminal clause for it?

What makes it infuriating is that there is no way to see if a list is improper by inspecting it... print the damn thing out it looks good... So you end up creating an improper base list, doing some stuff on it, passing it around, and then suddenly kabloowie you have a crash miles from where the error is and you pull your hair and scream and shout...

But you should be using the dialyzer to sniff these little beasts out for you.

Apologies

Following Robert's comment I tried printing out an improper list and, lo and behold, it is obvious:

(arrian@localhost)5>A = [1, 2, 3, 4].
[1,2,3,4]
(arrian@localhost)5> B = [1, 2, 3 | 4].
[1,2,3|4]
(arrian@localhost)6> io:format("A is ~p~nB is ~p~n", [A, B]).
A is [1,2,3,4]
B is [1,2,3|4]

I had spent some time hunting an improper list once and had convinced myself it was invsible, well Ah ken noo!

Solution 4:

To understand what an improper list is, you must first understand the definition of a proper list.

Specifically, the "neat discovery" of lists is that you can represent a list using only forms with a fixed number of elements, viz:

;; a list is either 
;; - empty, or
;; - (cons v l), where v is a value and l is a list.

This "data definition" (using the terms of How To Design Programs) has all kinds of nice properties. One of the nicest is that if we define the behavior or meaning of a function on each "branch" of the data definition, we're guaranteed not to miss a case. More significantly, structures like this generally lead to nice clean recursive solutions.

The classic "length" example:

(define (length l)
  (cond [(empty? l) 0]
        [else (+ 1 (length (rest l))]))

Of course, everything's prettier in Haskell:

length []    = 0
length (f:r) = 1 + length r

So, what does this have to do with improper lists?

Well, an improper list uses this data definition, instead:

;; an improper list is either
;; - a value, or
;; - (cons v l), where v is a value and l is an improper list

The problem is that this definition leads to ambiguity. In particular, the first and second cases overlap. Suppose I define "length" for an improper list thusly:

(define (length l)
  (cond [(cons? l) (+ 1 (length (rest l)))]
        [else 1]))

The problem is that I've destroyed the nice property that if I take two values and put them into an improper list with (cons a b), the result has length two. To see why, suppose I consider the values (cons 3 4) and (cons 4 5). The result is (cons (cons 3 4) (cons 4 5)), which may be interpreted either as the improper list containing (cons 3 4) and (cons 4 5), or as the improper list containing (cons 3 4), 4, and 5.

In a language with a more restrictive type system (e.g. Haskell), the notion of an "improper list" doesn't make quite as much sense; you could interpret it as a datatype whose base case has two things in it, which is probably not what you want, either.