Replacing character values with NA in a data frame
I have a data frame containing (in random places) a character value (say "foo"
) that I want to replace with a NA
.
What's the best way to do so across the whole data frame?
This:
df[ df == "foo" ] <- NA
One way to nip this in the bud is to convert that character to NA when you read the data in in the first place.
df <- read.csv("file.csv", na.strings = c("foo", "bar"))
Using dplyr::na_if
, you can replace specific values with NA
. In this case, that would be "foo"
.
library(dplyr)
set.seed(1234)
df <- data.frame(
id = 1:6,
x = sample(c("a", "b", "foo"), 6, replace = T),
y = sample(c("c", "d", "foo"), 6, replace = T),
z = sample(c("e", "f", "foo"), 6, replace = T),
stringsAsFactors = F
)
df
#> id x y z
#> 1 1 a c e
#> 2 2 b c foo
#> 3 3 b d e
#> 4 4 b d foo
#> 5 5 foo foo e
#> 6 6 b d e
na_if(df$x, "foo")
#> [1] "a" "b" "b" "b" NA "b"
If you need to do this for multiple columns, you can pass "foo"
through from mutate
with across
(updated for dplyr
v1.0.0+).
df %>%
mutate(across(c(x, y, z), na_if, "foo"))
#> id x y z
#> 1 1 a c e
#> 2 2 b c <NA>
#> 3 3 b d e
#> 4 4 b d <NA>
#> 5 5 <NA> <NA> e
#> 6 6 b d e