Case Statement Equivalent in R

I have a variable in a dataframe where one of the fields typically has 7-8 values. I want to collpase them 3 or 4 new categories within a new variable within the dataframe. What is the best approach?

I would use a CASE statement if I were in a SQL-like tool but not sure how to attack this in R.

Any help you can provide will be much appreciated!


case_when(), which was added to dplyr in May 2016, solves this problem in a manner similar to memisc::cases().

As of dplyr 0.7.0, for example:

mtcars %>% 
  mutate(category = case_when(
    cyl == 4 & disp < median(disp) ~ "4 cylinders, small displacement",
    cyl == 8 & disp > median(disp) ~ "8 cylinders, large displacement",
    TRUE ~ "other"
  )
)

Original answer

library(dplyr)
mtcars %>% 
  mutate(category = case_when(
    .$cyl == 4 & .$disp < median(.$disp) ~ "4 cylinders, small displacement",
    .$cyl == 8 & .$disp > median(.$disp) ~ "8 cylinders, large displacement",
    TRUE ~ "other"
  )
)

Have a look at the cases function from the memisc package. It implements case-functionality with two different ways to use it. From the examples in the package:

z1=cases(
    "Condition 1"=x<0,
    "Condition 2"=y<0,# only applies if x >= 0
    "Condition 3"=TRUE
    )

where x and y are two vectors.

References: memisc package, cases example