I've only been working with Prolog for a couple days. I understand some things but this is really confusing me.

I'm suppose to write a function that takes a list and flattens it.

?- flatten([a,[b,c],[[d],[],[e]]],Xs).  
Xs = [a,b,c,d,e].                           % expected result

The function takes out the inner structures of the list.

This is what I have so far:

flatten2([],[]).
flatten2([Atom|ListTail],[Atom|RetList]) :-
      atom(Atom), flatten2(ListTail,RetList).
flatten2([List|ListTail],RetList) :-
      flatten2(List,RetList).

Now, this works when I call:

?- flatten2([a,[b,c],[[d],[],[e]]], R).
R = [a,b,c,d,e].                         % works as expected!

But when I call to see if a list that I input is already flattened, is returns false instead of true:

?- flatten2([a,[b,c],[[d],[],[e]]], [a,b,c,d,e]).
false.                                   % BAD result!

Why does it work on one hand, but not the other? I feel like I'm missing something very simple.


The definition of flatten2/2 you've given is busted; it actually behaves like this:

?- flatten2([a, [b,c], [[d],[],[e]]], R).
R = [a, b, c] ;
false. 

So, given the case where you've already bound R to [a,b,c,d,e], the failure isn't surprising.

Your definition is throwing away the tail of lists (ListTail) in the 3rd clause - this needs to be tidied up and connected back into the list to return via RetList. Here is a suggestion:

flatten2([], []) :- !.
flatten2([L|Ls], FlatL) :-
    !,
    flatten2(L, NewL),
    flatten2(Ls, NewLs),
    append(NewL, NewLs, FlatL).
flatten2(L, [L]).

This one recursively reduces all lists of lists into either single item lists [x], or empty lists [] which it throws away. Then, it accumulates and appends them all into one list again out the output.

Note that, in most Prolog implementations, the empty list [] is an atom and a list, so the call to atom([]) and is_list([]) both evaluate to true; this won't help you throw away empty lists as opposed to character atoms.


You can maintain your lists open-ended, with both a pointer to its start, and an "ending hole ⁄ free pointer" (i.e. logvar) at its end, which you can then instantiate when the end is reached:

flatten2( [], Z, Z):- !.                                        % ---> X
flatten2( [Atom|ListTail], [Atom|X], Z) :-                      %      .
    \+is_list(Atom), !,                                         %      .
    flatten2( ListTail, X, Z).                                  %      Y
flatten2( [List|ListTail], X, Z) :-                             %      .
    flatten2( List,     X, Y),       % from X to Y, and then    %      .
    flatten2( ListTail, Y, Z).       % from Y to Z              %      Z --->

You then call it as

flatten2( A, B):- flatten2( A, B, []).

That way there's no need to use reverse anywhere. This technique is known as "difference lists", but it's much easier just to think about it as "open-ended lists" instead.


update: This is much easier coded using the dcg syntax. Since it is unidirectional (the first argument must be fully ground), why not use cuts after all:

flattn([]) --> [], !.
flattn([A|T]) --> {\+is_list(A)}, [A], !, flattn(T).
flattn([A|T]) --> flattn(A), flattn(T).

Testing:

16 ?- phrase(flattn([a,[b,c],[[d],[],[e]]]), [a, b, c, d, e]).
true.

17 ?- phrase(flattn([a,[b,c],[[d],[],[e]]]), R).
R = [a, b, c, d, e].

18 ?- phrase(flattn([a,[b,X],[[d],[],[e]]]), [a, b, c, d, e]).
X = c.

If the definition were fully declarative, the last one should've succeeded also with X=[c] ; X=[[],c] ; ... ; X=[[c]] ; ...; alas, it isn't.

(edit2: simplified both versions, thanks to @mat's comments!)


Here's an accumulator based version for completeness:

% flatten/2
flatten(List, Result) :- flatten(List, [], Result).

% auxiliary predicate flatten/3
flatten([], Result, Result).
flatten([Head| Tail], Part, Result) :- 
    is_list(Head),
    !, 
    flatten(Head, HR),
    append(Part, HR, PR),
    flatten(Tail, PR, Result).
flatten([Head| Tail], Part, Result) :- 
    append(Part, [Head], PR),
    flatten(Tail, PR, Result).
flatten(X, Part, Result) :-
    fail.

Prolog's list notation is syntactic sugar on top of very simple prolog terms. Prolog lists are denoted thus:

  1. The empty list is represented by the atom []. Why? Because that looks like the mathematical notation for an empty list. They could have used an atom like nil to denote the empty list but they didn't.

  2. A non-empty list is represented by the term .\2, where the first (leftmost) argument is the head of the list and the second (rightmost) argument is the tail of the list, which is, recursively, itself a list.

Some examples:

  • An empty list: [] is represented as the atom it is:

    []
    
  • A list of one element, [a] is internally stored as

    .(a,[])
    
  • A list of two elements [a,b] is internally stored as

    .(a,.(b,[]))
    
  • A list of three elements, [a,b,c] is internally stored as

    .(a,.(b,.(c,[])))
    

Examination of the head of the list is likewise syntactic sugar over the same notation:

  • [X|Xs] is identical to .(X,Xs)

  • [A,B|Xs] is identical to .(A,.(B,Xs))

  • [A,B] is (see above) identical to .(A,.(B,[]))

Myself, I'd write flatten/2 like this:

%------------------------
% public : flatten a list
%------------------------
flatten( X , R ) :-
  flatten( X , [] , T ) ,
  reverse( T , R )
  .

%--------------------------------------------
% private : flatten a list into reverse order
%--------------------------------------------
flatten( [] , R , R ) .        % the empty list signals the end of recursion
flatten( [X|Xs] , T , R ) :-   % anything else is flattened by
  flatten_head( X , T , T1 ) , % - flattening the head, and
  flatten( Xs , T1 , R )       % - flattening the tail
  .                            %

%-------------------------------------
% private : flatten the head of a list
%-------------------------------------
flatten_head( X , T , [X|T] ) :- % if the head is a not a list
  \+ list(X) ,                   % - simply prepend it to the accumulator.
  ! .                            %
flatten_head( X , T , R     ) :- % if the head is a list
  flatten( X , T , R )           % - recurse down and flatten it.
  .

%-----------------------
% what's a list, anyway?
%-----------------------
list( X ) :- var(X) , ! , fail .
list( []    ) .
list( [_|_] ) .

Building on if_//3 and list_truth/2, we can implement myflatten/2 as follows:

myflatten(Xs,Ys) :-
   phrase(myflatten_aux(Xs),Ys).

myflatten_aux([]) --> [].
myflatten_aux([T|Ts]) --> 
   if_(neither_nil_nor_cons_t(T), [T], myflatten_aux(T)),
   myflatten_aux(Ts).

:- use_module(library(dialect/sicstus/block)).

:- block neither_nil_nor_cons(-).
neither_nil_nor_cons(X) :-
   \+nil_or_cons(X).

nil_or_cons([]).
nil_or_cons([_|_]).

neither_nil_nor_cons_t(X,Truth) :-
   (  nonvar(X)
   -> (  neither_nil_nor_cons(X) -> Truth = true
      ;                             Truth = false
      )
   ;  nonvar(Truth) 
   -> (  Truth == true -> neither_nil_nor_cons(X)
      ;  Truth == false,  nil_or_cons(X)
      )
   ;  Truth = true,  neither_nil_nor_cons(X)
   ;  Truth = false, nil_or_cons(X)
   ).

Sample queries (taken from other answers, and comments to answers):

?- myflatten([[4],[[5,6],[7,[8],[9,[10,11]]]]], Xs).
Xs = [4, 5, 6, 7, 8, 9, 10, 11].

?- myflatten([1,[8,3],[3,[5,6],2],8], Xs).
Xs = [1, 8, 3, 3, 5, 6, 2, 8].

?- myflatten([a,[b,c],[],[[[d]]]], Xs).
Xs = [a, b, c, d].

?- myflatten([a,[b,c],[[d],[],[e]]], Xs).
Xs = [a, b, c, d, e].

neither_nil_nor_cons_t and not(nil_or_cons_t) describe have same solutions, but the solution order differs. Consider:

?- myflatten([A,B,C],Xs), A=a,B=b,C=c.
A = a,
B = b,
C = c,
Xs = [a, b, c] ;                       % does not terminate universally