R, dplyr - combination of group_by() and arrange() does not produce expected result?
when using dplyr function group_by()
and immediately afterwards arrange()
, I would expect to get an output where data frame is ordered within groups that I stated in group_by()
. My reading of documentation is that this combination should produce such a result, however when I tried it this is not what I get, and googling did not indicate that other people ran into the same issue. Am I wrong in expecting this result?
Here is an example, using the R built-in dataset ToothGrowth:
library(dplyr)
ToothGrowth %>%
group_by(supp) %>%
arrange(len)
Running this will produce a data frame where the whole data frame is ordered according to len
and not within supp
factors.
This is the code that produces the desired output:
ToothGrowth %>%
group_by(supp) %>%
do( data.frame(with(data=., .[order(len),] )) )
Solution 1:
You can produce the expected behaviour by setting .by_group = TRUE
in arrange
:
library(dplyr)
ToothGrowth %>%
group_by(supp) %>%
arrange(len, .by_group = TRUE)
Solution 2:
I think you want
ToothGrowth %>%
arrange(supp,len)
The chaining system just replaces nested commands, so first you are grouping, then ordering that grouped result, which breaks the original ordering.
Solution 3:
Another way to fix this unexpected order issue while still using the group_by()
statement is to convert the grouped_df
back to a data frame
.
group_by is needed for summaries for example:
ToothGrowthMeanLen <- ToothGrowth %>%
group_by(supp, dose) %>%
summarise(meanlen = mean(len))
This summary table is not arranged in the order of meanlen
ToothGrowthMeanLen %>%
arrange(meanlen)
This summary table is arranged in the order of meanlen
ToothGrowthMeanLen %>%
data.frame() %>% # Convert to a simple data frame
arrange(meanlen)
Converting grouped_df
back to a data frame is the first way I found to sort a summarised data.frame. But in fact dplyr::ungroup
exists for that purpose.
ToothGrowthMeanLen %>%
ungroup() %>% # Remove grouping
arrange(meanlen)