R: sample() command subject to a constraint
Solution 1:
To make sure you're sampling uniformly, you could just generate all the permutations and limit to those that sum to 7:
library(gtools)
perms <- permutations(8, 7, 0:7, repeats.allowed=T)
perms7 <- perms[rowSums(perms) == 7,]
From nrow(perms7)
, we see there are only 1716 possible permutations that sum to 7. Now you can uniformly sample from the permutations:
set.seed(144)
my.perms <- perms7[sample(nrow(perms7), 100000, replace=T),]
head(my.perms)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 0 0 0 2 5 0 0
# [2,] 1 3 0 1 2 0 0
# [3,] 1 4 1 1 0 0 0
# [4,] 1 0 0 3 0 3 0
# [5,] 0 2 0 0 0 5 0
# [6,] 1 1 2 0 0 2 1
An advantage of this approach is that it's easy to see that we're sampling uniformly at random. Also, it's quite quick -- building perms7
took 0.3 seconds on my computer and building a 1 million-row my.perms
took 0.04 seconds. If you need to draw many vectors this will be quite a bit quicker than a recursive approach because you're just using matrix indexing into perms7
instead of generating each vector separately.
Here's a distribution of counts of numbers in the sample:
# 0 1 2 3 4 5 6 7
# 323347 188162 102812 51344 22811 8629 2472 423
Solution 2:
Start with all zeroes, add one to any element, do 7 times:
sumTo = function(){
v = rep(0,7)
for(i in 1:7){
addTo=sample(7)[1]
v[addTo]=v[addTo]+1
}
v
}
Or equivalently, just choose which of the 7 elements you are going to increment in one sample of length 7, then tabulate those, making sure you tabulate up to 7:
sumTo = function(){tabulate(sample(7, 7, replace = TRUE), 7)}
> sumTo()
[1] 2 1 0 0 4 0 0
> sumTo()
[1] 1 3 1 0 1 0 1
> sumTo()
[1] 1 1 0 2 1 0 2
I don't know if this will produce a uniform sample from all possible combinations...
The distribution of individual elements over 100,000 reps is:
> X = replicate(100000,sumTo())
> table(X)
X
0 1 2 3 4 5 6
237709 277926 138810 38465 6427 627 36
Didn't hit a 0,0,0,0,0,7 that time!
Solution 3:
This recursive algorithm will output a distribution with a higher probability for large numbers than the other solutions. The idea is to throw a random number y
in 0:7
in any of the seven available slots, then repeat with a random number in 0:(7-y)
, etc:
sample.sum <- function(x = 0:7, n = 7L, s = 7L) {
if (n == 1) return(s)
x <- x[x <= s]
y <- sample(x, 1)
sample(c(y, Recall(x, n - 1L, s - y)))
}
set.seed(123L)
sample.sum()
# [1] 0 4 0 2 0 0 1
Drawing 100,000 vectors took 11 seconds on my machine and here is the distribution I get:
# 0 1 2 3 4 5 6 7
# 441607 98359 50587 33364 25055 20257 16527 14244
Solution 4:
There may be an easier and/or more elegant way, but here's a brute-force method using the LSPM:::.nPri
function. The link includes the definition for an R-only version of the algorithm, for those interested.
#install.packages("LSPM", repos="http://r-forge.r-project.org")
library(LSPM)
# generate all possible permutations, since there are only ~2.1e6 of them
# (this takes < 40s on my 2.2Ghz laptop)
x <- lapply(seq_len(8^7), nPri, n=8, r=7, replace=TRUE)
# set each permutation that doesn't sum to 7 to NULL
y <- lapply(x, function(p) if(sum(p-1) != 7) NULL else p-1)
# subset all non-NULL permutations
z <- y[which(!sapply(y, is.null))]
Now you can sample from z
and be assured that you're getting a permutation that sums to 7.