How to count the number of words in a sentence, ignoring numbers, punctuation and whitespace?
How would I go about counting the words in a sentence? I'm using Python.
For example, I might have the string:
string = "I am having a very nice 23!@$ day. "
That would be 7 words. I'm having trouble with the random amount of spaces after/before each word as well as when numbers or symbols are involved.
str.split()
without any arguments splits on runs of whitespace characters:
>>> s = 'I am having a very nice day.'
>>>
>>> len(s.split())
7
From the linked documentation:
If sep is not specified or is
None
, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.
You can use regex.findall()
:
import re
line = " I am having a very nice day."
count = len(re.findall(r'\w+', line))
print (count)
s = "I am having a very nice 23!@$ day. "
sum([i.strip(string.punctuation).isalpha() for i in s.split()])
The statement above will go through each chunk of text and remove punctuations before verifying if the chunk is really string of alphabets.
This is a simple word counter using regex. The script includes a loop which you can terminate it when you're done.
#word counter using regex
import re
while True:
string =raw_input("Enter the string: ")
count = len(re.findall("[a-zA-Z_]+", string))
if line == "Done": #command to terminate the loop
break
print (count)
print ("Terminated")