data.table::dcast long to wide data while ignoring NA-Category?
I want to transform my data from long to wide after some joins, resulting in a few NA
s in the data provided.
Unfortunately, these NA
s are also present in the richt-hand side (RHS), which defines the newly added columns via the transformation.
Consider this example:
library(data.table)
dt <- data.table(id=c(1,2,1,2,3,4),
group = c("A","A","B","B",NA,NA),
values = c(7,8,9,10,NA,NA))
dt_wide <- dcast(dt,
id ~ group,
value.var = c("values"))
In the data, rows 5 and 6 do not have any group or associated value:
id group values
1: 1 A 7
2: 2 A 8
3: 1 B 9
4: 2 B 10
5: 3 <NA> NA
6: 4 <NA> NA
if there is an associated value, a group does exist, therefore: (group
== NA
) => (value
== NA
)
the transformed dataframe wrongly considers NA
as its own group in the group
- column, which results in the following wide data table:
id NA A B
1: 1 NA 7 9
2: 2 NA 8 10
3: 3 NA NA NA
4: 4 NA NA NA
I would not prefer to build a possible buggy workaround where i retroactively delete the NA column by name or values (as it might handle different colnames and columns later in production).
Is there a way to tell dcast
to ignore the NAs in group and not make an extra column out of it, while preserving all rows in the transformed table?
Like this:
id A B
1: 1 7 9
2: 2 8 10
3: 3 NA NA
4: 4 NA NA
This is tricky, but seems to work:
dcast(dt,
id ~ ifelse(is.na(group),unique(na.omit(dt$group)),group),
value.var = c("values"))
Key: <id>
id A B
<num> <num> <num>
1: 1 7 9
2: 2 8 10
3: 3 NA NA
4: 4 NA NA
I don't think it's possible to prevent dcast
from doing that. I'd just filter them out afterwards:
dt_wide[, names(dt_wide) != "NA", with = FALSE]
Output:
id A B
1: 1 7 9
2: 2 8 10
3: 3 NA NA
4: 4 NA NA