Replacing character values with NA in a data frame

I have a data frame containing (in random places) a character value (say "foo") that I want to replace with a NA.

What's the best way to do so across the whole data frame?

This:

df[ df == "foo" ] <- NA

One way to nip this in the bud is to convert that character to NA when you read the data in in the first place.

df <- read.csv("file.csv", na.strings = c("foo", "bar"))

Using dplyr::na_if, you can replace specific values with NA. In this case, that would be "foo".

library(dplyr)
set.seed(1234)

df <- data.frame(
  id = 1:6,
  x = sample(c("a", "b", "foo"), 6, replace = T),
  y = sample(c("c", "d", "foo"), 6, replace = T),
  z = sample(c("e", "f", "foo"), 6, replace = T),
  stringsAsFactors = F
)
df
#>   id   x   y   z
#> 1  1   a   c   e
#> 2  2   b   c foo
#> 3  3   b   d   e
#> 4  4   b   d foo
#> 5  5 foo foo   e
#> 6  6   b   d   e

na_if(df$x, "foo")
#> [1] "a" "b" "b" "b" NA  "b"

If you need to do this for multiple columns, you can pass "foo" through from mutate with across (updated for dplyr v1.0.0+).

df %>%
  mutate(across(c(x, y, z), na_if, "foo"))
#>   id    x    y    z
#> 1  1    a    c    e
#> 2  2    b    c <NA>
#> 3  3    b    d    e
#> 4  4    b    d <NA>
#> 5  5 <NA> <NA>    e
#> 6  6    b    d    e

Replacing character values with NA in a data frame

Related

Recent Posts