Word count for multiple .txt files in linux

I need to find the words in multiple .txt files using a linux cli. Currently I am using the following command:

cat *.txt|wc -w

I have made a test directory to practice the command and it seems to work for each individual .txt file but it fails to do it properly for all the .txt files. I have a directory with 5 files in which 4 of them contain each 5 words and 1 is emtpy. For the individual cat textfile.txt|wc -w it gives the right answer. But for the count it gives 17 when it should be (4 times 5 + 0 times 0 =) 20 Can someone tell my why the count given is 17 while the real count is 20?

Solution 1:

You can run

wc -w *.txt

This will give you the word count for each file and a total sum in the last row.

As it turned out, OPs issue was a missing newline in one of the files. This caused cat *txt to combine multiple words into one and therefore resulting in a wrong count. The command above is more robust in this situation as it processes each file individually.

Solution 2:

The most likely explanation is that the final lines of your files are not properly newline-terminated, so that when you cat them, the first word of the next file gets appended to last word of the previous file:

Ex. given

steeldriver@pc:~$ printf 'foo\nbar\nbaz\nbam\nboo' | tee {1..4}.txt
foo
bar
baz
bam
boosteeldriver@pc:~$ printf '' > 5.txt

then

steeldriver@pc:~$ wc -w {1..5}.txt
 5 1.txt
 5 2.txt
 5 3.txt
 5 4.txt
 0 5.txt
20 total

but

steeldriver@pc:~$ cat {1..5}.txt | wc -w
17

Word count for multiple .txt files in linux

Solution 1:

Solution 2:

Related

Recent Posts