Selecting only numeric columns from a data frame

Suppose, you have a data.frame like this:

x <- data.frame(v1=1:20,v2=1:20,v3=1:20,v4=letters[1:20])

How would you select only those columns in x that are numeric?


Solution 1:

EDIT: updated to avoid use of ill-advised sapply.

Since a data frame is a list we can use the list-apply functions:

nums <- unlist(lapply(x, is.numeric))  

Then standard subsetting

x[ , nums]

## don't use sapply, even though it's less code
## nums <- sapply(x, is.numeric)

For a more idiomatic modern R I'd now recommend

x[ , purrr::map_lgl(x, is.numeric)]

Less codey, less reflecting R's particular quirks, and more straightforward, and robust to use on database-back-ended tibbles:

dplyr::select_if(x, is.numeric)

Newer versions of dplyr, also support the following syntax:

x %>% dplyr::select(where(is.numeric))

Solution 2:

The dplyr package's select_if() function is an elegant solution:

library("dplyr")
select_if(x, is.numeric)

Solution 3:

Filter() from the base package is the perfect function for that use-case: You simply have to code:

Filter(is.numeric, x)

It is also much faster than select_if():

library(microbenchmark)
microbenchmark(
    dplyr::select_if(mtcars, is.numeric),
    Filter(is.numeric, mtcars)
)

returns (on my computer) a median of 60 microseconds for Filter, and 21 000 microseconds for select_if (350x faster).

Solution 4:

in case you are interested only in column names then use this :

names(dplyr::select_if(train,is.numeric))

Solution 5:

This an alternate code to other answers:

x[, sapply(x, class) == "numeric"]

with a data.table

x[, lapply(x, is.numeric) == TRUE, with = FALSE]