Combine column to remove NA's
I have some columns in R and for each row there will only ever be a value in one of them, the rest will be NA's. I want to combine these into one column with the non-NA value. Does anyone know of an easy way of doing this. For example I could have as follows:
data <- data.frame('a' = c('A','B','C','D','E'),
'x' = c(1,2,NA,NA,NA),
'y' = c(NA,NA,3,NA,NA),
'z' = c(NA,NA,NA,4,5))
So I would have
'a' 'x' 'y' 'z'
A 1 NA NA
B 2 NA NA
C NA 3 NA
D NA NA 4
E NA NA 5
And I would to get
'a' 'mycol'
A 1
B 2
C 3
D 4
E 5
The names of the columns containing NA changes depending on code earlier in the query so I won't be able to call the column names explicitly, but I have the column names of the columns which contains NA's stored as a vector e.g. in this example cols <- c('x','y','z')
, so could call the columns using data[, cols]
.
Any help would be appreciated.
Thanks
A dplyr::coalesce
based solution could be as:
data %>% mutate(mycol = coalesce(x,y,z)) %>%
select(a, mycol)
# a mycol
# 1 A 1
# 2 B 2
# 3 C 3
# 4 D 4
# 5 E 5
Data
data <- data.frame('a' = c('A','B','C','D','E'),
'x' = c(1,2,NA,NA,NA),
'y' = c(NA,NA,3,NA,NA),
'z' = c(NA,NA,NA,4,5))
You can use unlist
to turn the columns into one vector. Afterwards, na.omit
can be used to remove the NA
s.
cbind(data[1], mycol = na.omit(unlist(data[-1])))
a mycol
x1 A 1
x2 B 2
y3 C 3
z4 D 4
z5 E 5
Here's a more general (but even simpler) solution which extends to all column types (factors, characters etc.) with non-ordered NA's. The strategy is simply to merge the non-NA values of other columns into your merged column using is.na
for indexing:
data$m = data$x # your new merged column start with x
data$m[!is.na(data$y)] = data$y[!is.na(data$y)] # merge with y
data$m[!is.na(data$z)] = data$z[!is.na(data$z)] # merge with z
> data
a x y z m
1 A 1 NA NA 1
2 B 2 NA NA 2
3 C NA 3 NA 3
4 D NA NA 4 4
5 E NA NA 5 5
Note that this will overwrite existing values in m
if there are several non-NA values in the same row. If you have a lot of columns you could automate this by looping over colnames(data)
.