Collapsing data frame by selecting one row per group

Solution 1:

Maybe duplicated() can help:

R> d[ !duplicated(d$x), ]
  x  y  z
1 1 10 20
3 2 12 18
4 4 13 17
R> 

Edit Shucks, never mind. This picks the first in each block of repetitions, you wanted the last. So here is another attempt using plyr:

R> ddply(d, "x", function(z) tail(z,1))
  x  y  z
1 1 11 19
2 2 12 18
3 4 13 17
R> 

Here plyr does the hard work of finding unique subsets, looping over them and applying the supplied function -- which simply returns the last set of observations in a block z using tail(z, 1).

Solution 2:

Just to add a little to what Dirk provided... duplicated has a fromLast argument that you can use to select the last row:

d[ !duplicated(d$x,fromLast=TRUE), ]

Solution 3:

Here is a data.table solution which will be time and memory efficient for large data sets

library(data.table)
DT <- as.data.table(d)           # convert to data.table
setkey(DT, x)                    # set key to allow binary search using `J()`
DT[J(unique(x)), mult ='last']   # subset out the last row for each x
DT[J(unique(x)), mult ='first']  # if you wanted the first row for each x