Splitting a continuous variable into equal sized groups
Solution 1:
try this:
split(das, cut(das$anim, 3))
if you want to split based on the value of wt
, then
library(Hmisc) # cut2
split(das, cut2(das$wt, g=3))
anyway, you can do that by combining cut
, cut2
and split
.
UPDATED
if you want a group index as an additional column, then
das$group <- cut(das$anim, 3)
if the column should be index like 1, 2, ..., then
das$group <- as.numeric(cut(das$anim, 3))
UPDATED AGAIN
try this:
> das$wt2 <- as.numeric(cut2(das$wt, g=3))
> das
anim wt wt2
1 1 181.0 1
2 2 179.0 1
3 3 180.5 1
4 4 201.0 2
5 5 201.5 2
6 6 245.0 2
7 7 246.4 3
8 8 189.3 1
9 9 301.0 3
10 10 354.0 3
11 11 369.0 3
12 12 205.0 2
13 13 199.0 1
14 14 394.0 3
15 15 231.3 2
Solution 2:
Or see cut_number
from the ggplot2
package, e.g.
das$wt_2 <- as.numeric(cut_number(das$wt,3))
Note that cut(...,3)
divides the range of the original data into three ranges of equal lengths; it doesn't necessarily result in the same number of observations per group if the data are unevenly distributed (you can replicate what cut_number
does by using quantile
appropriately, but it's a nice convenience function). On the other hand, Hmisc::cut2()
using the g=
argument does split by quantiles, so is more or less equivalent to ggplot2::cut_number
. I might have thought that something like cut_number
would have made its way into dplyr
by so far, but as far as I can tell it hasn't.