Filter data.frame rows by a logical condition
Solution 1:
To select rows according to one 'cell_type' (e.g. 'hesc'), use ==
:
expr[expr$cell_type == "hesc", ]
To select rows according to two or more different 'cell_type', (e.g. either 'hesc' or 'bj fibroblast'), use %in%
:
expr[expr$cell_type %in% c("hesc", "bj fibroblast"), ]
Solution 2:
Use subset
(for interactive use)
subset(expr, cell_type == "hesc")
subset(expr, cell_type %in% c("bj fibroblast", "hesc"))
or better dplyr::filter()
filter(expr, cell_type %in% c("bj fibroblast", "hesc"))
Solution 3:
The reason expr[expr[2] == 'hesc']
doesn't work is that for a data frame, x[y]
selects columns, not rows. If you want to select rows, change to the syntax x[y,]
instead:
> expr[expr[2] == 'hesc',]
expr_value cell_type
4 5.929771 hesc
5 5.873096 hesc
6 5.665857 hesc
Solution 4:
You could use the dplyr
package:
library(dplyr)
filter(expr, cell_type == "hesc")
filter(expr, cell_type == "hesc" | cell_type == "bj fibroblast")
Solution 5:
No one seems to have included the which function. It can also prove useful for filtering.
expr[which(expr$cell == 'hesc'),]
This will also handle NAs and drop them from the resulting dataframe.
Running this on a 9840 by 24 dataframe 50000 times, it seems like the which method has a 60% faster run time than the %in% method.