Count number of words that have at least 2 same consecutive letters BASH

Solution 1:

Given this input text lorem-ipsum.txt:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum

Same text as above with its double letters highlighted:

Lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliquaUt enim ad minim veniam quis nostrud exercitation (1) ullamco laboris nisi ut aliquip ex ea (2) commodo consequatDuis aute irure dolor in reprehenderit in voluptate velit (3) esse (4) cillum dolore eu fugiat (5) nulla pariaturExcepteur sint (6) occaecat cupidatat non proident sunt in culpa qui (7) officia deserunt (8) mollit anim id est laborum

You count words containing consecutive double letters with:

tr -c '[:alpha:]' '\n' < lorem-ipsum.txt | grep -c '\(.\)\1'
  • tr -c '[:alpha:]' '\n': Turns non-alpha characters to newline, so each word is placed on its own line in the stream.
  • grep -c '\(.\)\1': Counts occurrences of any captured character \(.\) followed by itself \1.

Test assertion:

[ 8 -eq "$(tr -c '[:alpha:]' '\n' < lorem-ipsum.txt | grep -c '\(.\)\1')" ]