Suppose there is a square box with side length $m$ (measured in pixels). Let there be $n$ points in this box, distributed uniformly within the box (with integer coordinates, aligned to a pixel grid). If we take from each point the Euclidean distance to its nearest neighbor, what would be the expected value of this distance averaged over all points?

My actual problem is about a discrete pixel grid, an $m\times m$ bitmap image, but if that's easier I would be happy with a continuous solution. A more general solution e.g. for a rectangle instead of a square box is welcome, but at this point it is not necessary. I found similar questions about the continous case, but without answers. For me it wouldn't be easy to generalise the case for two points only.


Solution 1:

Empirically it is between $\dfrac{0.5}{\sqrt{n-1}}$ and $\dfrac{0.56}{\sqrt{n-1}}$, tending towards the lower bound as $n$ increases. This is not very different to putting $n-1$ random points in a circle or square with area $1$ and finding the expected minimum distance to the centre.

At the bottom end with $n=2$, the question becomes the expected Euclidean distance between two random points in a square, which an earlier answer gives as $\frac{1}{15}(2+\sqrt{2}+5\log(1+\sqrt{2})) \approx 0.5214$

Simulation results look like this, both directly and multiplying by $\sqrt{n-1}$

enter image description here

with values of

n    expected dist nn     sqrt(n-1) times expected dist nn
2           0.521                   0.521
3           0.389                   0.550
4           0.322                   0.557
5           0.279                   0.558
6           0.250                   0.558
7           0.227                   0.557
8           0.210                   0.555
9           0.196                   0.553
10          0.184                   0.552
20          0.124                   0.540
50          0.075                   0.527
100         0.052                   0.519
200         0.036                   0.514
500         0.023                   0.509
1000        0.016                   0.507

(See comments) The simulation code was

set.seed(1)          # to reproduce
maxn <- 1024         # top number for n-1 neighbours 
cases <- 10^6        # simulations 
x0 <- runif(cases)   # base points  
y0 <- runif(cases)
mindistsq <- rep(Inf, cases)
emindist <- numeric(maxn)
for (i in 1:maxn){
  x <- runif(cases)  # possible neighbour
  y <- runif(cases)
  distsq <- (x0-x)^2 + (y0-y)^2
  mindistsq <- ifelse(distsq < mindistsq, distsq, mindistsq)
  emindist[i] <- mean(sqrt(mindistsq))
  }
emindistsqrtn1 <- emindist * sqrt(1:maxn)   # multiplying by sqrt(n-1)
par(mfrow=c(2,1))
plot(2:(maxn+1), emindist, xlab="number of points in square", 
                     main="Expected distance of nearest neighbour in square",
                     ylab="Expected dist nearest neighbour",
               ylim = c(0,0.6))
plot(2:(maxn+1), emindistsqrtn1 , xlab="number of points in square", 
               ylab="sqrt(n-1) times expct dist nn",
               main="sqrt(n-1) times Expected distance of nearest neighbour",
               ylim = c(0.5,0.56))
par(mfrow=c(1,1))
exn1 <- c(1:9, 19, 49, 99, 199, 499, 999)
tab <- cbind(round(emindist[exn1],3), round(emindistsqrtn1[exn1],3))  
colnames(tab) <- c("expected min","sqrt(n-1) times exp min")
rownames(tab) <- (exn1+1) 
tab