R: sample() command subject to a constraint

Solution 1:

To make sure you're sampling uniformly, you could just generate all the permutations and limit to those that sum to 7:

library(gtools)
perms <- permutations(8, 7, 0:7, repeats.allowed=T)
perms7 <- perms[rowSums(perms) == 7,]

From nrow(perms7), we see there are only 1716 possible permutations that sum to 7. Now you can uniformly sample from the permutations:

set.seed(144)
my.perms <- perms7[sample(nrow(perms7), 100000, replace=T),]
head(my.perms)
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,]    0    0    0    2    5    0    0
# [2,]    1    3    0    1    2    0    0
# [3,]    1    4    1    1    0    0    0
# [4,]    1    0    0    3    0    3    0
# [5,]    0    2    0    0    0    5    0
# [6,]    1    1    2    0    0    2    1

An advantage of this approach is that it's easy to see that we're sampling uniformly at random. Also, it's quite quick -- building perms7 took 0.3 seconds on my computer and building a 1 million-row my.perms took 0.04 seconds. If you need to draw many vectors this will be quite a bit quicker than a recursive approach because you're just using matrix indexing into perms7 instead of generating each vector separately.

Here's a distribution of counts of numbers in the sample:

#      0      1      2      3      4      5      6      7 
# 323347 188162 102812  51344  22811   8629   2472    423 

Solution 2:

Start with all zeroes, add one to any element, do 7 times:

sumTo = function(){
    v = rep(0,7)
    for(i in 1:7){
        addTo=sample(7)[1]
        v[addTo]=v[addTo]+1
    }
    v
}

Or equivalently, just choose which of the 7 elements you are going to increment in one sample of length 7, then tabulate those, making sure you tabulate up to 7:

sumTo = function(){tabulate(sample(7, 7, replace = TRUE), 7)}


> sumTo()
[1] 2 1 0 0 4 0 0
> sumTo()
[1] 1 3 1 0 1 0 1
> sumTo()
[1] 1 1 0 2 1 0 2

I don't know if this will produce a uniform sample from all possible combinations...

The distribution of individual elements over 100,000 reps is:

> X = replicate(100000,sumTo())
> table(X)
X
     0      1      2      3      4      5      6 
237709 277926 138810  38465   6427    627     36 

Didn't hit a 0,0,0,0,0,7 that time!

Solution 3:

This recursive algorithm will output a distribution with a higher probability for large numbers than the other solutions. The idea is to throw a random number y in 0:7 in any of the seven available slots, then repeat with a random number in 0:(7-y), etc:

sample.sum <- function(x = 0:7, n = 7L, s = 7L) {
   if (n == 1) return(s)
   x <- x[x <= s]
   y <- sample(x, 1)
   sample(c(y, Recall(x, n - 1L, s - y)))
}

set.seed(123L)
sample.sum()
# [1] 0 4 0 2 0 0 1

Drawing 100,000 vectors took 11 seconds on my machine and here is the distribution I get:

#      0      1      2      3      4      5      6      7 
# 441607  98359  50587  33364  25055  20257  16527  14244 

Solution 4:

There may be an easier and/or more elegant way, but here's a brute-force method using the LSPM:::.nPri function. The link includes the definition for an R-only version of the algorithm, for those interested.

#install.packages("LSPM", repos="http://r-forge.r-project.org")
library(LSPM)
# generate all possible permutations, since there are only ~2.1e6 of them
# (this takes < 40s on my 2.2Ghz laptop)
x <- lapply(seq_len(8^7), nPri, n=8, r=7, replace=TRUE)
# set each permutation that doesn't sum to 7 to NULL
y <- lapply(x, function(p) if(sum(p-1) != 7) NULL else p-1)
# subset all non-NULL permutations
z <- y[which(!sapply(y, is.null))]

Now you can sample from z and be assured that you're getting a permutation that sums to 7.