combine rows in data frame containing NA to make complete row
I know this is a duplicate Q but I can't seem to find the post again
Using the following data
df <- data.frame(A=c(1,1,2,2),B=c(NA,2,NA,4),C=c(3,NA,NA,5),D=c(NA,2,3,NA),E=c(5,NA,NA,4))
A B C D E
1 NA 3 NA 5
1 2 NA 2 NA
2 NA NA 3 NA
2 4 5 NA 4
Grouping by A
, I'd like the following output using a tidyverse
solution
A B C D E
1 2 3 2 5
2 4 5 3 4
I have many groups in A
. I think I saw an answer using coalesce
but am unsure how to get it work. I'd like a solution that works with characters
as well. Thanks!
Solution 1:
I haven't figured out how to put the coalesce_by_column
function inside the dplyr
pipeline, but this works:
coalesce_by_column <- function(df) {
return(coalesce(df[1], df[2]))
}
df %>%
group_by(A) %>%
summarise_all(coalesce_by_column)
## A B C D E
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2 3 2 5
## 2 2 4 5 3 4
Edit: include @Jon Harmon's solution for more than 2 members of a group
# Supply lists by splicing them into dots:
coalesce_by_column <- function(df) {
return(dplyr::coalesce(!!! as.list(df)))
}
df %>%
group_by(A) %>%
summarise_all(coalesce_by_column)
#> # A tibble: 2 x 5
#> A B C D E
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 3 2 5
#> 2 2 4 5 3 4
Solution 2:
We can use fill
to fill all the missing values. And then filter just one row for each group.
library(dplyr)
library(tidyr)
df2 <- df %>%
group_by(A) %>%
fill(everything(), .direction = "down") %>%
fill(everything(), .direction = "up") %>%
slice(1)
And thanks to @Roger-123, the above code can be further simplified as follows.
df2 <- df %>%
group_by(A) %>%
fill(everything(), .direction = "downup") %>%
slice(1)
Solution 3:
Not tidyverse
but here's one base R solution
df <- data.frame(A=c(1,1),B=c(NA,2),C=c(3,NA),D=c(NA,2),E=c(5,NA))
sapply(df, function(x) x[!is.na(x)][1])
#A B C D E
#1 2 3 2 5
With updated data
do.call(rbind, lapply(split(df, df$A), function(a) sapply(a, function(x) x[!is.na(x)][1])))
# A B C D E
#1 1 2 3 2 5
#2 2 4 5 3 4
Solution 4:
Here is an even more general solution (using unique
, na.omit
to sort of create coalesce
), which can handle more than two rows with overlapping information. Super simply and forward.
> df <- data.frame(A=c(1,1,2,2,2),B=c(NA,2,NA,4,4),C=c(3,NA,NA,5,NA),D=c(NA,2,3,NA,NA),E=c(5,NA,NA,4,4))
> df
A B C D E
1 1 NA 3 NA 5
2 1 2 NA 2 NA
3 2 NA NA 3 NA
4 2 4 5 NA 4
5 2 4 NA NA 4
> df %>% group_by(A) %>% summarise_all(funs( na.omit(unique(.)) ))
# A tibble: 2 x 5
A B C D E
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2 3 2 5
2 2 4 5 3 4