Is cut() style binning available in dplyr?
Just so there's an immediate answer for others arriving here via search engine, the n-breaks form of cut
is now implemented as the ntile
function in dplyr
:
> data.frame(x = c(5, 1, 3, 2, 2, 3)) %>% mutate(bin = ntile(x, 2))
x bin
1 5 2
2 1 1
3 3 2
4 2 1
5 2 1
6 3 2
I see this question was never updated with the tidyverse
solution so I'll add it for posterity.
The function to use is cut_interval
from the ggplot2
package. It works similar to base::cut
but it does a better job of marking start and end points than the base
function in my experience because cut
increases the range by 0.1% at each end.
data.frame(x = c(5, 1, 3, 2, 2, 3)) %>% mutate(bin = cut_interval(x, n = 2))
x bin
1 5 (3,5]
2 1 [1,3]
3 3 [1,3]
4 2 [1,3]
5 2 [1,3]
6 3 [1,3]
You can also specify the bin width with cut_width
.
data.frame(x = c(5, 1, 3, 2, 2, 3)) %>% mutate(bin = cut_width(x, width = 2, center = 1))
x bin
1 5 (4,6]
2 1 [0,2]
3 3 (2,4]
4 2 [0,2]
5 2 [0,2]
6 3 (2,4]
The following works with dplyr
, assuming x
is the variable we wish to bin:
# Make n bins
df %>% mutate( x_bins = cut( x, breaks = n )
# Or make specific bins
df %>% mutate( x_bins = cut( x, breaks = c(0,2,6,10) )