Case Statement Equivalent in R
I have a variable in a dataframe where one of the fields typically has 7-8 values. I want to collpase them 3 or 4 new categories within a new variable within the dataframe. What is the best approach?
I would use a CASE statement if I were in a SQL-like tool but not sure how to attack this in R.
Any help you can provide will be much appreciated!
case_when()
, which was added to dplyr in May 2016, solves this problem in a manner similar to memisc::cases()
.
As of dplyr 0.7.0, for example:
mtcars %>%
mutate(category = case_when(
cyl == 4 & disp < median(disp) ~ "4 cylinders, small displacement",
cyl == 8 & disp > median(disp) ~ "8 cylinders, large displacement",
TRUE ~ "other"
)
)
Original answer
library(dplyr)
mtcars %>%
mutate(category = case_when(
.$cyl == 4 & .$disp < median(.$disp) ~ "4 cylinders, small displacement",
.$cyl == 8 & .$disp > median(.$disp) ~ "8 cylinders, large displacement",
TRUE ~ "other"
)
)
Have a look at the cases
function from the memisc
package. It implements case-functionality with two different ways to use it.
From the examples in the package:
z1=cases(
"Condition 1"=x<0,
"Condition 2"=y<0,# only applies if x >= 0
"Condition 3"=TRUE
)
where x
and y
are two vectors.
References: memisc package, cases example