Avoiding type conflicts with dplyr::case_when
I am trying to use dplyr::case_when
within dplyr::mutate
to create a new variable where I set some values to missing and recode other values simultaneously.
However, if I try to set values to NA
, I get an error saying that we cannot create the variable new
because NA
s are logical:
Error in mutate_impl(.data, dots) :
Evaluation error: must be type double, not logical.
Is there a way to set values to NA
in a non-logical vector in a data frame using this?
library(dplyr)
# Create data
df <- data.frame(old = 1:3)
# Create new variable
df <- df %>% dplyr::mutate(new = dplyr::case_when(old == 1 ~ 5,
old == 2 ~ NA,
TRUE ~ old))
# Desired output
c(5, NA, 3)
As said in ?case_when
:
All RHSs must evaluate to the same type of vector.
You actually have two possibilities:
1) Create new
as a numeric vector
df <- df %>% mutate(new = case_when(old == 1 ~ 5,
old == 2 ~ NA_real_,
TRUE ~ as.numeric(old)))
Note that NA_real_
is the numeric version of NA
, and that you must convert old
to numeric because you created it as an integer in your original dataframe.
You get:
str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ old: int 1 2 3
# $ new: num 5 NA 3
2) Create new
as an integer vector
df <- df %>% mutate(new = case_when(old == 1 ~ 5L,
old == 2 ~ NA_integer_,
TRUE ~ old))
Here, 5L
forces 5 into the integer type, and NA_integer_
is the integer version of NA
.
So this time new
is integer:
str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ old: int 1 2 3
# $ new: int 5 NA 3
Try this ?
df %>% dplyr::mutate(new = dplyr::case_when(.$old == 1 ~ 5,
.$old == 2 ~ NA_real_,
TRUE~.$old))
> df
old new
1 1 5
2 2 NA
3 3 3