Splitting a continuous variable into equal sized groups

Solution 1:

try this:

split(das, cut(das$anim, 3))

if you want to split based on the value of wt, then

library(Hmisc) # cut2
split(das, cut2(das$wt, g=3))

anyway, you can do that by combining cut, cut2 and split.

UPDATED

if you want a group index as an additional column, then

das$group <- cut(das$anim, 3)

if the column should be index like 1, 2, ..., then

das$group <- as.numeric(cut(das$anim, 3))

UPDATED AGAIN

try this:

> das$wt2 <- as.numeric(cut2(das$wt, g=3))
> das
   anim    wt wt2
1     1 181.0   1
2     2 179.0   1
3     3 180.5   1
4     4 201.0   2
5     5 201.5   2
6     6 245.0   2
7     7 246.4   3
8     8 189.3   1
9     9 301.0   3
10   10 354.0   3
11   11 369.0   3
12   12 205.0   2
13   13 199.0   1
14   14 394.0   3
15   15 231.3   2

Solution 2:

Or see cut_number from the ggplot2 package, e.g.

das$wt_2 <- as.numeric(cut_number(das$wt,3))

Note that cut(...,3) divides the range of the original data into three ranges of equal lengths; it doesn't necessarily result in the same number of observations per group if the data are unevenly distributed (you can replicate what cut_number does by using quantile appropriately, but it's a nice convenience function). On the other hand, Hmisc::cut2() using the g= argument does split by quantiles, so is more or less equivalent to ggplot2::cut_number. I might have thought that something like cut_number would have made its way into dplyr by so far, but as far as I can tell it hasn't.