Reshaping data frame with duplicates
I have what should be a simple reshaping problem, but I can't figure it out. Part of my data looks like this:
foo <- structure(list(grade = c(3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8,
3, 3, 4, 4, 5, 5, 6, 6), var.type = structure(c(3L, 2L, 3L, 2L,
3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L
), .Label = c("Raw Score", "SE", "SS"), class = "factor"), var.val = c(120L,
47L, 120L, 46L, 120L, 46L, 120L, 47L, 120L, 46L, 120L, 46L, 120L,
12L, 120L, 14L, 120L, 16L, 120L, 20L)), .Names = c("grade", "var.type",
"var.val"), row.names = c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L), class = "data.frame")
or
grade var.type var.val
2 3 SS 120
3 3 SE 47
4 4 SS 120
5 4 SE 46
6 5 SS 120
7 5 SE 46
I want to make it look like this:
grade SS SE
3 120 47
4 120 46
5 120 46
and so on. I have tried reshape, cast, and dcast as in this thread:
Reshaping dataset
but nothing seems to work. I would really appreciate some help. TIA.
If you want to reshape and you have duplicates, you're going to need to give each pair a unique id:
foorle <- rle(foo$grade)
fooids <- rep(seq_len(length(foorle$values)), times=foorle$lengths)
fooids
[1] 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10
Now you'll be able to use reshape properly:
idfoo <- cbind(id=fooids, foo)
library(reshape)
dcast(idfoo, id+grade~var.type, value.var="var.val")
id grade SE SS
1 1 3 47 120
2 2 4 46 120
3 3 5 46 120
4 4 6 47 120
5 5 7 46 120
6 6 8 46 120
7 7 3 12 120
8 8 4 14 120
9 9 5 16 120
10 10 6 20 120
EDIT: Please note I'm assuming your data is in order, else you'll have problems distinguishing between duplicates. If it isn't, you can always use order
so that it is.
library(plyr)
library(reshape2)
# First we add a grouping variable to deal with the duplicates
foo <- ddply(foo, .(grade, var.type), function(x) { x$group <- 1:nrow(x); x })
dcast(foo, grade + group ~ var.type, value.var= "var.val")[-2]
grade SE SS
1 3 47 120
2 3 12 120
3 4 46 120
4 4 14 120
5 5 46 120
6 5 16 120
7 6 47 120
8 6 20 120
9 7 46 120
10 8 46 120
It is not as pretty as reshape, but
data.frame(grade = foo[2 * (1:(nrow(foo)/2)),]$grade,
SS = foo[foo$var.type == "SS", ]$var.val,
SE = foo[foo$var.type == "SE", ]$var.val )
produces
grade SS SE
1 3 120 47
2 4 120 46
3 5 120 46
4 6 120 47
5 7 120 46
6 8 120 46
7 3 120 12
8 4 120 14
9 5 120 16
10 6 120 20
You have to assume the data comes in pairs of rows for this.