combine rows in data frame containing NA to make complete row

I know this is a duplicate Q but I can't seem to find the post again

Using the following data

df <- data.frame(A=c(1,1,2,2),B=c(NA,2,NA,4),C=c(3,NA,NA,5),D=c(NA,2,3,NA),E=c(5,NA,NA,4))

  A  B  C  D  E
  1 NA  3 NA  5
  1  2 NA  2 NA
  2 NA NA  3 NA
  2  4  5 NA  4

Grouping by A, I'd like the following output using a tidyverse solution

  A  B  C  D  E
  1  2  3  2  5
  2  4  5  3  4

I have many groups in A. I think I saw an answer using coalesce but am unsure how to get it work. I'd like a solution that works with characters as well. Thanks!

Solution 1:

I haven't figured out how to put the coalesce_by_column function inside the dplyr pipeline, but this works:

coalesce_by_column <- function(df) {
  return(coalesce(df[1], df[2]))
}

df %>%
  group_by(A) %>%
  summarise_all(coalesce_by_column)

##       A     B     C     D     E
##   <dbl> <dbl> <dbl> <dbl> <dbl>
## 1     1     2     3     2     5
## 2     2     4     5     3     4

Edit: include @Jon Harmon's solution for more than 2 members of a group

# Supply lists by splicing them into dots:
coalesce_by_column <- function(df) {
  return(dplyr::coalesce(!!! as.list(df)))
}

df %>%
  group_by(A) %>%
  summarise_all(coalesce_by_column)

#> # A tibble: 2 x 5
#>       A     B     C     D     E
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     2     3     2     5
#> 2     2     4     5     3     4

Solution 2:

We can use fill to fill all the missing values. And then filter just one row for each group.

library(dplyr)
library(tidyr)

df2 <- df %>%
  group_by(A) %>%
  fill(everything(), .direction = "down") %>%
  fill(everything(), .direction = "up") %>%
  slice(1)

And thanks to @Roger-123, the above code can be further simplified as follows.

df2 <- df %>%
  group_by(A) %>%
  fill(everything(), .direction = "downup") %>%
  slice(1)

Solution 3:

Not tidyverse but here's one base R solution

df <- data.frame(A=c(1,1),B=c(NA,2),C=c(3,NA),D=c(NA,2),E=c(5,NA))
sapply(df, function(x) x[!is.na(x)][1])
#A B C D E 
#1 2 3 2 5

With updated data

do.call(rbind, lapply(split(df, df$A), function(a) sapply(a, function(x) x[!is.na(x)][1])))
#  A B C D E
#1 1 2 3 2 5
#2 2 4 5 3 4

Solution 4:

Here is an even more general solution (using unique, na.omit to sort of create coalesce), which can handle more than two rows with overlapping information. Super simply and forward.

> df <- data.frame(A=c(1,1,2,2,2),B=c(NA,2,NA,4,4),C=c(3,NA,NA,5,NA),D=c(NA,2,3,NA,NA),E=c(5,NA,NA,4,4))

> df
  A  B  C  D  E
1 1 NA  3 NA  5
2 1  2 NA  2 NA
3 2 NA NA  3 NA
4 2  4  5 NA  4
5 2  4 NA NA  4

> df %>% group_by(A) %>% summarise_all(funs( na.omit(unique(.)) ))
# A tibble: 2 x 5
      A     B     C     D     E
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     2     3     2     5
2     2     4     5     3     4

Why the open quote and bracket for eval('(' + jsonString+ ')') when parsing json string

How do I create an event handler for a programmatically created object in VB.NET?

What are the advantages/disadvantages for creating a top level function in ES6 with arrows or without?

C++11: The range-based for statement: "range-init" lifetime?

remove blur effect on child element

How to iterate json data in jquery

Google Maps API V2 'Failed to Load Map. Could not contact Google Servers'

itertools.groupby() not grouping correctly

Garbage collection behaviour for String.intern()

Error inflating class and android.support.v7.widget.CardView

Converting Raw HTTP Request into HTTPWebRequest Object

error C2679: binary '<<' : no operator found which takes a right-hand operand of type 'std::string' (or there is no acceptable conversion)