Does a random set of points in the plane contain a large empty convex polygon?
Suppose I choose $n$ points uniformly at random from the unit square $[0,1]\times [0,1]$, obtaining a set of points $S=\{p_1,\ldots, p_n\}\subset [0,1]\times [0,1]$. Then $S$ may contain subsets which span an empty convex polygon. For example, in the illustration below, we have an empty convex polygon with $6$ corners. A polygon is "empty" if it contains no other point of $S$. Some empty convex hulls may have many corners, some fewer. I am curious about the asymptotic behaviour of this phenomenon. To this end, let
$$ f(S)= \max \{|T|:T\subseteq S\text{ and }T\text{ is convex and } S\cap \text{conv}(T)=T\}$$
Question. How does the expectation $\mathbb E[f]$ grow as $n\to\infty$?
Conjecture. $\mathbb E[f]=\Theta(\sqrt{n})$, due to the birthday paradox.
I think this is a Ramsey Theory type question, but I am not equipped to answer it. I would be happy with either a lower bound or an upper bound, or pointers to finding them.
Solution 1:
No, there are only small empty convex polygons
@BillyJoe has discovered that Balogh, Gonzalez-Aguilar and Salazar (1) solved this question in 2012. They showed that a random set of points contains, on average, an empty convex polygon of size
$$\mathbb E[f]=\Theta\left(\frac{\log n}{\log\log n}\right)$$ They also show that with high probability, this is the size of the largest one, that is, the outliers do not influence the average very much (outliers exist of course: you could lay out $n$ points along a circle, but such configurations are apparently very rare).
Let me give an exposition of their surprisingly simple argument that you are likely to find such a large empty convex polygon.
First: The probability that $r$ points form a convex polygon is known (exactly! due to Valtr [2]), if the points are drawn from any parallelogram, namely it is $$ \mathbb P[r \text{ points are convex}]=\left(\frac{\binom{2r-2}{r-1}}{r!}\right)^2$$
They now imagine that we get to repeatedly draw $r$ points at random from a parallelogram; if we have an $r$-sided convex polygon, the experiment is a success; otherwise, we repeat the experiment on a clean slate. Then they ask: How many times would we have to repeat this experiment, drawing $r$ points each time, before we saw a polygon with $r$ sides? It turns out that choosing $n$ satisfying $r=\frac{\log(n)}{2\log\log(n)}$ does the trick: then we do $n/r$ independent experiments before we see our first convex $r$-sided polygon, drawing a grand total of $n$ points over all experiments. So we have our bound.
Very good, but this would only work if we had $n/r$ disjoint parallelograms to work with. But we don't, because in our setting we draw all our points from the $[0,1]\times [0,1]$ square, so our $n/r$ "experiments" are not independent. How do we solve this?
The authors use a clever trick here. Namely, suppose that we have drawn $n$ points at random, and assume wlog that no two of them lie on a vertical line. Then we may group the points from left to right in groups of $r$ points, so that the square is cut up into $n/r$ long rectangles, as in the illustration below.
These points lie in non-overlapping rectangles. Moreover, conditioned upon the fact that a given rectangle contains $r$ points, they have been drawn uniformly at random from that rectangle, and so our bound applies anyway!
Q.E.D.
They then prove that this is in fact the largest polygon you are likely to find, but via a more involved argument, which I leave for the reader to explore.
Let me remark that my conjecture of $\mathbb E [f]=\Theta(\sqrt n)$ was exponentially higher than the true answer; a big error. I had guessed that the largest empty convex polygon might contain roughly as many corners as the convex hull of the $n$ points, which I thought would be $\approx \sqrt n$ points, since that is the length of each side. This was the right direction, except that this estimate was also way off, since apparently most of the points near the edge are not in the convex hull after all, indeed only $\Theta(\log n)$ points comprise the hull, proved by Renyi and Sulanke [3]. Again, I was off by an exponential magnitude.
It seems to me that the estimate of the number of vertices that one likely encounters, $\geq \log(n)/2\log\log n$, can probably be improved (by a constant factor only, since the bound is tight asymptotically), by considering a more complex argument which tries to squeeze more points into each rectangle. Currently, the upper and lower bounds of Balogh et al.'s differ by a factor of 320, namely they showed that w.h.p., with $r$ the largest empty convex polygon, $$\frac{\log n}{2\log \log n}\leq r\leq 160\frac{\log n}{\log\log n}$$ It also seems that no attempts have been made to reduce this gap, so any budding Ramsey theory enthusiasts have their work cut out for them.
References
(1) Balogh, József; González-Aguilar, Hernán; Salazar, Gelasio, Large convex holes in random point sets, Comput. Geom. 46, No. 6, 725-733 (2013). ZBL1271.52003.
(2) Valtr, P, Probability that (n) random points are in convex position, Discrete Comput. Geom. 13, No. 3-4, 637-643 (1995). ZBL0820.60007.
(3) Rényi, Alfréd; Sulanke, R., Über die konvexe Hülle von (n) zufällig gewählten Punkten, Z. Wahrscheinlichkeitstheor. Verw. Geb. 2, 75-84 (1963). ZBL0118.13701.