How to select R data.table rows based on substring match (a la SQL like)
I have a data.table with a character column, and want to select only those rows that contain a substring in it. Equivalent to SQL WHERE x LIKE '%substring%'
E.g.
> Months = data.table(Name = month.name, Number = 1:12)
> Months["mb" %in% Name]
Empty data.table (0 rows) of 2 cols: Name,Number
How would I select only the rows where Name contains "mb"?
data.table
has a like
function.
Months[like(Name,"mb")]
Name Number
1: September 9
2: November 11
3: December 12
Or, %like%
looks nicer :
> Months[Name %like% "mb"]
Name Number
1: September 9
2: November 11
3: December 12
Note that %like%
and like()
use grepl
(returns logical vector) rather than grep
(returns integer locations). That's so it can be combined with other logical conditions :
> Months[Number<12 & Name %like% "mb"]
Name Number
1: September 9
2: November 11
and you get the power of regular expression search (not just % or * wildcard), too.
The operator %in%
does not do partial string matching
it is used for finding if values exist in another set of values i.e. "a" %in% c("a","b","c")
To do partial string matching you need to use the grep()
function.
You can use the grep
to return an index of all columns with "mb"
in it. Then subset the rows by that index
Months[grep("mb", Name)] # data.table syntax slightly easier