Selecting columns in R data frame based on those *not* in a vector

Solution 1:

An alternative to grep is which:

df.2 <- df[, -which(names(df) %in% c("name1", "name2", "name3"))]

Solution 2:

You can make a shorter call that is also more generalizable with negative-grep:

df.2 <- df[, -grep("^name[1:3]$", names(df) )] 

Since grep returns numerics you can use the negative vector indexing to remove columns. You could add further number or more complex patterns.

Solution 3:

dplyr::select() has several options for dropping specific columns:

library(dplyr)

drop_columns <- c('cyl','disp','hp')
mtcars %>% 
  select(-one_of(drop_columns)) %>% 
  head(2)

              mpg drat    wt  qsec vs am gear carb
Mazda RX4      21  3.9 2.620 16.46  0  1    4    4
Mazda RX4 Wag  21  3.9 2.875 17.02  0  1    4    4

Negating specific column names, the following drops the column "hp" and the columns from "qsec" through "gear":

mtcars %>% 
  select(-hp, -(qsec:gear)) %>% 
  head(2)

              mpg cyl disp drat    wt carb
Mazda RX4      21   6  160  3.9 2.620    4
Mazda RX4 Wag  21   6  160  3.9 2.875    4

You could also negate contains(), starts_with(), ends_with(), or matches():

mtcars %>% 
  select(-contains('t')) %>%
  select(-starts_with('a')) %>% 
  select(-ends_with('b')) %>% 
  select(-matches('^m.+g$')) %>% 
  head(2)

              cyl disp  hp  qsec vs gear
Mazda RX4       6  160 110 16.46  0    4
Mazda RX4 Wag   6  160 110 17.02  0    4

Solution 4:

Old thread, but here's another solution:

df.2 <- subset(df, select=-c(name1, name2, name3))

This was posted in another similar thread (though I can't find it right now). Should be sustainable code in the situation you describe, and is probably easier to read and edit than some of the other options.

Solution 5:

You could make a custom function to do this if you're using it for your own use to manipulate data. I may do something like this:

rm.col <- function(df, ...) {
    x <- substitute(...())
    z <- Trim(unlist(lapply(x, function(y) as.character(y))))
    df[, !names(df) %in% z]
}

rm.col(mtcars, hp, mpg)

The first argument is the dataframe name. the following ... are the names of any columns you wish to remove.