Prolog - count repetitions in list

I'm trying to look through a list and count the number of times a given word appears. I've got this so far:

count_repetitions([_], [], 0).
count_repetitions([Word], [Word|Tail], Count):-
   count_repetitions([Word], Tail, X), 
   Count is X + 1.
count_repetitions([Word], [Z|Tail], Count):-
   Word \= Z, 
   count_repetitions([Word], Tail, Count).

So the query ?- count_repetitions([yes],[yes,and,yes,and,no], X). would give X = 2.

This appears to work. Now I need to write a predicate that outputs a list with the search word and the number of times it appears, in the form X = [(yes - 2)]. I'm completely stuck, any suggestions?


Solution 1:

This answer shows a logically pure way to do it. The following is based on clpfd.

:- use_module(library(clpfd)).

We define meta-predicate tcount/3 similarly to tfilter/3!

:- meta_predicate tcount(2,?,?).
tcount(P_2,Xs,N) :-
   N #>= 0,
   list_pred_tcount_(Xs,P_2,0,N).

:- meta_predicate list_pred_tcount_(?,2,?,?).
list_pred_tcount_([]    , _ ,N ,N).
list_pred_tcount_([X|Xs],P_2,N0,N) :-
   if_(call(P_2,X), (N1 is N0+1, N1 #=< N), N1 = N0),
   list_pred_tcount_(Xs,P_2,N1,N).

Now let's use tcount/3 in combination with (=)/3:

?- tcount(=(yes),[yes,and,yes,and,no],Count).
Count = 2.

Unlike the code presented by all other answers to this question, the code presented in this answer is monotone and remains logically sound even when using it with non-ground terms:

?- tcount(=(yes),[A,B,C,D],2).
      A=yes ,     B=yes , dif(C,yes), dif(D,yes)
;     A=yes , dif(B,yes),     C=yes , dif(D,yes)
;     A=yes , dif(B,yes), dif(C,yes),     D=yes
; dif(A,yes),     B=yes ,     C=yes , dif(D,yes)
; dif(A,yes),     B=yes , dif(C,yes),     D=yes
; dif(A,yes), dif(B,yes),     C=yes ,     D=yes
; false.

Let's try something even more general:

?- tcount(=(yes),[A,B,C,D],Count).
      A=yes ,     B=yes ,     C=yes ,     D=yes , Count = 4
;     A=yes ,     B=yes ,     C=yes , dif(D,yes), Count = 3
;     A=yes ,     B=yes , dif(C,yes),     D=yes , Count = 3
;     A=yes ,     B=yes , dif(C,yes), dif(D,yes), Count = 2
;     A=yes , dif(B,yes),     C=yes ,     D=yes , Count = 3
;     A=yes , dif(B,yes),     C=yes , dif(D,yes), Count = 2
;     A=yes , dif(B,yes), dif(C,yes),     D=yes , Count = 2
;     A=yes , dif(B,yes), dif(C,yes), dif(D,yes), Count = 1
; dif(A,yes),     B=yes ,     C=yes ,     D=yes , Count = 3
; dif(A,yes),     B=yes ,     C=yes , dif(D,yes), Count = 2
; dif(A,yes),     B=yes , dif(C,yes),     D=yes , Count = 2
; dif(A,yes),     B=yes , dif(C,yes), dif(D,yes), Count = 1
; dif(A,yes), dif(B,yes),     C=yes ,     D=yes , Count = 2
; dif(A,yes), dif(B,yes),     C=yes , dif(D,yes), Count = 1
; dif(A,yes), dif(B,yes), dif(C,yes),     D=yes , Count = 1
; dif(A,yes), dif(B,yes), dif(C,yes), dif(D,yes), Count = 0.

What about the following corner case?

?- tcount(_,_,-1).
false.

And how about utilizing tcount/3 as an alternative to length/2?

?- N in 1..3, length(Xs,N).
  N = 1, Xs = [_A]
; N = 2, Xs = [_A,_B]
; N = 3, Xs = [_A,_B,_C]
...                                      % does not terminate

?- use_module(library(lambda)).
true.

?- N in 1..3, tcount(\_^ =(true),Xs,N).
  N = 1, Xs = [_A]
; N = 2, Xs = [_A,_B]
; N = 3, Xs = [_A,_B,_C]
; false.                                 % terminates universally

Solution 2:

You are there, already, it seems to me. You could simply wrap your predicate in another one saying:

word_repetitions(Word, List, [(Word-Count)]) :-
    count_repetitions(Word, List, Count).

Note that you don't need the parenthesis or the brackets around the Word-Count pair:

word_repetitions(Word, List, Word-Count) :-
    count_repetitions(Word, List, Count).

(but you can use them if you insist).

On your original predicate, renamed to reflect the differences:

list_word_reps([], Word, Word-0).
list_word_reps([W|Rest], Word, Word-Reps) :-
    list_word_reps(Rest, Word, Word-Reps0),
    (   W == Word
    ->  Reps is Reps0 + 1
    ;   Reps = Reps0
    ).

?- list_word_reps([yes,no,yes,no,maybe,yes], yes, X).
X = yes-3.

The reason why the list comes before the word is that the predicate then becomes deterministic. Same goes for using the if-then-else instead of two different clauses. You can put the answer in a list if you want to (just wrap the argument in brackets) but again, it is unnecessary.

Solution 3:

library(aggregate) is often undervalued:

count(L, C) :-
    aggregate(set(W-N), aggregate(count, member(W, L), N), C).

yields

1 ?- count([a,b,a],C).
C = [a-2, b-1].

so, the simpler

count(W, L, W-N) :-
    aggregate(count, member(W, L), N).

yields

?- count(a, [a,b,a], C).
C = a-2.

being based on setof, aggregate/3 allows for finer control about the quantification of variables (i.e. which values get aggregated), but will fail if there is no solution, instead of yielding 0, as sometime is required.

aggregate_all/3, based on findall/3, would return 0 in such cases, but doesn't allows for quantification specifiers.