Correct syntax for mutate_if

I would like to replace NA values with zeros via mutate_if in dplyr. The syntax below:

set.seed(1)
mtcars[sample(1:dim(mtcars)[1], 5),
       sample(1:dim(mtcars)[2], 5)] <-  NA

require(dplyr)

mtcars %>% 
    mutate_if(is.na,0)

mtcars %>% 
    mutate_if(is.na, funs(. = 0))

Returns error:

Error in vapply(tbl, p, logical(1), ...) : values must be length 1, but FUN(X[[1]]) result is length 32

What's the correct syntax for this operation?


Solution 1:

The "if" in mutate_if refers to choosing columns, not rows. Eg mutate_if(data, is.numeric, ...) means to carry out a transformation on all numeric columns in your dataset.

If you want to replace all NAs with zeros in numeric columns:

data %>% mutate_if(is.numeric, funs(ifelse(is.na(.), 0, .)))

Solution 2:

I learned this trick from the purrr tutorial, and it also works in dplyr. There are two ways to solve this problem:
First, define custom functions outside the pipe, and use it in mutate_if():

any_column_NA <- function(x){
    any(is.na(x))
}
replace_NA_0 <- function(x){
    if_else(is.na(x),0,x)
}
mtcars %>% mutate_if(any_column_NA,replace_NA_0)

Second, use the combination of ~,. or .x.( .x can be replaced with ., but not any other character or symbol):

mtcars %>% mutate_if(~ any(is.na(.x)),~ if_else(is.na(.x),0,.x))
#This also works
mtcars %>% mutate_if(~ any(is.na(.)),~ if_else(is.na(.),0,.))

In your case, you can also use mutate_all():

mtcars %>% mutate_all(~ if_else(is.na(.x),0,.x))

Using ~, we can define an anonymous function, while .x or . stands for the variable. In mutate_if() case, . or .x is each column.

Solution 3:

mtcars %>% mutate_if(is.numeric, replace_na, 0)

or more recent syntax

mtcars %>% mutate(across(where(is.numeric),
                         replace_na, 0))

Solution 4:

We can use set from data.table

library(data.table)
setDT(mtcars)
for(j in seq_along(mtcars)){
  set(mtcars, i= which(is.na(mtcars[[j]])), j = j, value = 0)
 }

Solution 5:

I always struggle with replace_na function of dplyr

  replace(is.na(.),0)

this works for me for what you are trying to do.