How to replace NA values in a table for selected columns
Solution 1:
You can do:
x[, 1:2][is.na(x[, 1:2])] <- 0
or better (IMHO), use the variable names:
x[c("a", "b")][is.na(x[c("a", "b")])] <- 0
In both cases, 1:2
or c("a", "b")
can be replaced by a pre-defined vector.
Solution 2:
Building on @Robert McDonald's tidyr::replace_na()
answer, here are some dplyr
options for controlling which columns the NA
s are replaced:
library(tidyverse)
# by column type:
x %>%
mutate_if(is.numeric, ~replace_na(., 0))
# select columns defined in vars(col1, col2, ...):
x %>%
mutate_at(vars(a, b, c), ~replace_na(., 0))
# all columns:
x %>%
mutate_all(~replace_na(., 0))
Solution 3:
Edit 2020-06-15
Since data.table
1.12.4 (Oct 2019), data.table
gains two functions to facilitate this: nafill
and setnafill
.
nafill
operates on columns:
cols = c('a', 'b')
y[ , (cols) := lapply(.SD, nafill, fill=0), .SDcols = cols]
setnafill
operates on tables (the replacements happen by-reference/in-place)
setnafill(y, cols=cols, fill=0)
# print y to show the effect
y[]
This will also be more efficient than the other options; see ?nafill
for more, the last-observation-carried-forward (LOCF) and next-observation-carried-backward (NOCB) versions of NA
imputation for time series.
This will work for your data.table
version:
for (col in c("a", "b")) y[is.na(get(col)), (col) := 0]
Alternatively, as David Arenburg points out below, you can use set
(side benefit - you can use it either on data.frame
or data.table
):
for (col in 1:2) set(x, which(is.na(x[[col]])), col, 0)
Solution 4:
This is now trivial in tidyr
with replace_na()
. The function appears to work for data.tables as well as data.frames:
tidyr::replace_na(x, list(a=0, b=0))