Count word occurrences in R
Is there a function for counting the number of times a particular keyword is contained in a dataset?
For example, if dataset <- c("corn", "cornmeal", "corn on the cob", "meal")
the count would be 3.
Solution 1:
Let's for the moment assume you wanted the number of element containing "corn":
length(grep("corn", dataset))
[1] 3
After you get the basics of R down better you may want to look at the "tm" package.
EDIT: I realize that this time around you wanted any-"corn" but in the future you might want to get word-"corn". Over on r-help Bill Dunlap pointed out a more compact grep pattern for gathering whole words:
grep("\\<corn\\>", dataset)
Solution 2:
Another quite convenient and intuitive way to do it is to use the str_count
function of the stringr
package:
library(stringr)
dataset <- c("corn", "cornmeal", "corn on the cob", "meal")
# for mere occurences of the pattern:
str_count(dataset, "corn")
# [1] 1 1 1 0
# for occurences of the word alone:
str_count(dataset, "\\bcorn\\b")
# [1] 1 0 1 0
# summing it up
sum(str_count(dataset, "corn"))
# [1] 3
Solution 3:
You can also do something like the following:
length(dataset[which(dataset=="corn")])