Is a random subset of $\mathbb{R}^2$ connected?

To really answer this question, we have to go back to the Kolmogorov axioms.

The Kolmogorov axioms state that a probability space is a triple $(A, E, P)$ such that

$A$ is a set
$E$ is a collection of subsets of $A$ - that is, $E \subseteq \mathcal{P}(A)$
$P$ is a function $P : E \to [0, 1]$

such that the following rules hold:

$\sigma$-algebra axioms:

$A \in E$
If $B, C \in E$ then $B \setminus C \in E$
If $B_i$ is in $E$ for all $i \in \mathbb{N}$ then $\bigcup\limits_{i \in \mathbb{N}} B_i \in E$.

Measure axioms:

$P(\emptyset) = 0$
If $B_i$ is in $E$ for all $i \in \mathbb{N}$, and if $B_i \cap B_j = \emptyset$ for all $i \neq j$, then $P(\bigcup\limits_{i \in \mathbb{N}} B_i) = \sum\limits_{i \in \mathbb{N}} P(B_i)$

Probability axiom:

$P(A) = 1$

$A$ is known as "the sample space". $E$ is known as "the set of events", and elements of $E$ are called "events". And $P$ is the probability function.

Let's say we want to discuss the outcome of a Bernoulli experiment with $n$ independent identically distributed indicator variables $X_1, ..., X_n$, each of which takes value $1$ with probability $p$ and value $0$ with probability $1 - p$.

The obvious probability space for this discussion would be $A = \{0, 1\}^n$, the set of all $n$-tuples of either $0$ or $1$. We would take $E = \mathcal{P}(A)$ and define $P(S) = \sum\limits_{s \in S} \prod\limits_{i = 1}^n (1 - p) + (2p - 1) s_i$. The $(1 - p) + (2p - 1) s_i$ term looks a bit weird, but it's just designed to be $p$ when $s_i = 1$ and $1 - p$ when $s_i = 0$.

It's easy to verify that this is in fact a probability distribution, that the random variable $X_i(s) = s_i$ takes value $0$ with probability $1 - p$ and value $1$ with probability $p$, and that the $X_i$ are mutually independent.

What if we want to do infinitely many variables? It turns out that this is still possible. I won't go into the exact details of how it's done, but we can come up with probability space built from the sample space $\{0, 1\}^S$, where $S$ is some possibly infinite set, using something called the "Borel $\sigma$-algebra" as our event space. Basically, we only allow events that can be "built up" from the basic events of $X_i = 0$ and $X_i = 1$ using the processes of countable union and complementation. We can then define the probability measure $P$ using Caratheodory's Criterion and an outer measure. This is all rather technical and would require a good course in measure theory to introduce, but it can be done perfectly well.

So it's perfectly valid to take $|\mathbb{R}^2|$ different random variables and form a probability distribution out of them.

The problem here is that you would need to prove that $\{s \in \{0, 1\}^{\mathbb{R}^2} \mid \{x \in \mathbb{R}^2 \mid s_x = 1\}$ is connected$\}$ is actually part of the $\sigma$-algebra of events. Only events can have their probability taken.

I strongly suspect (but do not yet have a proof) that it will turn out this is not a measurable set. Therefore, we will be unable to ask the question of its probability.

Edit: if we're using the Borel $\sigma$-algebra, then I do in fact have a proof.

Theorem: Let $s$ be a set in the Borel $\sigma$-algebra on $\{0, 1\}^B$ where $B$ is a set. There must be some countable set $V \subseteq B$ such that for all $y \in S$, for all $z \in \{0, 1\}^B$, if for all $v \in V$, $y_v = z_v$, then $z \in S$.

Proof: we proceed by induction on the definition of the Borel $\sigma$-algebra.

Base case 1: $\{0, 1\}^B$. This one is immediate - simply take $V = \emptyset$.

Base case 2: $\{x \in \{0, 1\}^B \mid x_b = q\}$. This one is also immediate: take $V = \{b\}$.

Inductive step 1: Suppose $C, D$ are Borel sets satisfying the property. Pick $V_C$ and $V_D$ respectively. Then $V_C \cup V_D$ is countable and works for $C \setminus D$.

Inductive step 2: Suppose $C_i$ is a Borel set satisfying the property for all $1 \leq i \leq n$. Then for each $i$, take $V_{C_i}$ which works for $C_i$. Then $\bigcup\limits_{i \in \mathbb{N}} V_{C_i}$ is countable and works for $\bigcup\limits_{i \in \mathbb{N}} C_i$.

So the proof is complete. Now consider that there is no such $V$ which works for the set of connected sets.

Short answer. The probability is undefined, because the probability space is underspecified. In fact, there are models for your problem where the event $\{S\ \text{is connected}\}$ is measurable with arbitrary probability.

Long answer (proof). Consider the following equivalent formulation of your problem:

Question 1. Let $(\mathcal P(\mathbb{R}^2),\mathscr{A},\mathbb{P})$ be a probability space on $\mathcal P(\mathbb{R}^2)$. Suppose that for each $p \in \mathbb{R}^2$ the event $\{p \in S\}$ is measurable with $\mathbb{P}[p \in S] = \tfrac{1}{2}$, and suppose that the family of random variables $\{\{p \in S\} \, \mid \, p\in\mathbb{R}^2\}$ is independent. Does this determine the probability of the event $\{S\ \text{is connected}\}$?

A probability space meeting these requirements is guaranteed to exist. Indeed, for each $p \in \mathbb{R}^2$, let $(\Omega_p,\mathscr{A}_p,\mu_p)$ be the probability space with ambient space $\Omega_p = \{0,1\}$, $\sigma$-algebra $\mathscr{A}_p = \mathcal P(\{0,1\})$, and probability measure $\mu(0) = \mu(1) = \tfrac{1}{2}$ (a Bernoulli random variable with probability $\frac{1}{2}$), and let $(\Omega,\mathscr{A},\mu) = \prod_{p\in\mathbb{R}^2} (\Omega_p,\mathscr{A}_p,\mu_p)$ be the product of these probability spaces (see this answer). Then $\Omega$ can be identified with $\mathcal P(\mathbb{R}^2)$, and the point events $\{\{p \in S\} \, \mid \, p\in\mathbb{R}^2\}$ are i.i.d. $\text{Bernoulli}(\frac{1}{2})$ random variables, so $(\Omega,\mathscr{A},\mu)$ meets the requirements of Question 1.

The issue is that the set $\{S \ \text{is connected}\}$ is not measurable in this measure space (I think this is also what Mark Saving's answer tried to show). I will go one step further and show that it is not measurable in the completion $(\Omega,\mathscr{A}^\mu,\mu)$ of $(\Omega,\mathscr{A},\mu)$, and that for every $\xi \in [0,1]$ there exists an extension of $(\Omega,\mathscr{A}^\mu,\mu)$ in which the event $\{S\ \text{is connected}\}$ is measurable with probability $\xi$.

Lemma 2. Let $(\Omega,\mathscr{A},\mu) = \prod_{p\in\mathbb{R}^2} (\Omega_p,\mathscr{A}_p,\mu_p)$ be the infinite product defined above, and let $A \in \mathscr{A}$. Then there is a countable set $M \subseteq \mathbb{R}^2$ such that for every $U \in A$ and every $V \subseteq \mathbb{R}^2$ with $U \cap M = V \cap M$ one has $V \in A$.

Proof sketch. Every event in an uncountable product $\prod_{i\in I} (\Omega_i,\mathscr{A}_i,\mu_i)$ of probability spaces belongs to some countable sub-product; see e.g. Lemma 3.5.2 in [Bog07].

Lemma 3. Let $M \subseteq \mathbb{R}^2$ be countable with $|M| \geq 2$. Then $M$ is disconnected.

Proof. Assume first that the points of $M$ do not all have the same $x$-coordinate. Write $\alpha = \inf\{x \, \mid \, (x,y) \in M\} \in [-\infty,+\infty)$ and $\omega = \sup\{x \, \mid \, (x,y) \in M\} \in (-\infty,+\infty]$. Then $\alpha < \omega$, and for every $\mu \in (\alpha,\omega)$ the set $M$ contains points on either side of the vertical line $x = \mu$. Since $M$ is countable and $(\alpha,\omega)$ is uncountable, we may choose some $\mu \in (\alpha,\omega)$ such that $x \neq \mu$ for all $(x,y) \in M$. But then $M = (M \cap \{x < \mu\}) \cup (M \cap \{x > \mu\})$, so $M$ is disconnected.

Assume now that the points of $M$ all have the same $x$-coordinate. Then, since $|M| \geq 2$, the points of $M$ do not all have the same $y$-coordinate, so an analogous argument shows that $M$ is disconnected. $\quad\Box$

Lemma 4. Let $M \subseteq \mathbb{R}^2$ be countable. Then $\mathbb{R}^2 \setminus M$ is connected.

Proof. It is sufficient to prove that $\mathbb{R}^2 \setminus M$ is path connected, since every path connected space is connected. Let $x,y \in \mathbb{R}^2 \setminus M$ be distinct. Since $M$ is countable, there are uncountably many lines through $x$ (resp. $y$) which contain no points from $M$. Choose a line $\ell_x$ through $x$ and a line $\ell_y$ through $y$ such that $\ell_x$ and $\ell_y$ are not parallel and $\ell_x \cap M = \ell_y \cap M = \varnothing$. Then $\ell_x$ and $\ell_y$ intersect, so we can form a path in $\mathbb{R}^2 \setminus M$ from $x$ to $y$ via $\ell_x$ and $\ell_y$. $\quad\Box$

Proposition 5. Let $(\Omega,\mathscr{A},\mu) = \prod_{p\in\mathbb{R}^2} (\Omega_p,\mathscr{A}_p,\mu_p)$ be the infinite product defined above, and let $A \in \mathscr{A}$ be non-empty. Then $A$ contains both a connected and a disconnected subset.

Proof. By Lemma 2, we may choose a countable set $M \subseteq \mathbb{R}^2$ such that for every $U \in A$ and every $V \subseteq \mathbb{R}^2$ with $U \cap M = V \cap M$ we have $V \in A$. Since $A$ is non-empty, we may choose some $U_0 \in A$. Choose two distinct points $x_1,x_2 \in \mathbb{R}^2 \setminus M$, and define $V_0,V_1 \subseteq \mathbb{R}^2$ by $V_0 = \{x_1,x_2\} \cup (U_0 \cap M)$ and $V_1 = U_0 \cup (\mathbb{R}^2 \setminus M)$. Then $V_0 \cap M = V_1 \cap M = U_0 \cap M$, so we have $V_0,V_1 \in A$. Furthermore, $V_0$ is disconnected (by Lemma 3) and $V_1$ is connected (by Lemma 4). $\quad\Box$

Corollary 6. Let $(\Omega,\mathscr{A},\mu) = \prod_{p\in\mathbb{R}^2} (\Omega_p,\mathscr{A}_p,\mu_p)$ be the infinite product defined above, and let $A,B \in \mathscr{A}$ such that $$ A \subseteq \{S \ \text{is connected}\} \subseteq B. $$ Then $A = \varnothing$ and $B = \Omega$.

Proof. If $A \in \mathscr{A}$ is non-empty, then $A$ contains a disconnected subset, so $A \not\subseteq \{S \ \text{is connected}\}$. If $B \in \mathscr{A}$ with $B \neq \mathbb{R}^2$, then $\mathbb{R}^2 \setminus B$ contains a connected subset, so $\{S \ \text{is connected}\} \not\subseteq B$. $\quad\Box$

Let $\mu_*,\mu^* : \mathcal P(\Omega) \to [0,1]$ denote the inner and outer measures associated with $\mu$; that is: \begin{align*} \mu_*(B) &= \sup\{\mu(A) \, \mid \, A \in \mathscr{A}, \ A \subseteq B\};\\[1ex] \mu^*(B) &= \inf\{\mu(A) \, \mid \, A \in \mathscr{A}, \ B \subseteq A\}. \end{align*} It follows from Corollary 6 that $\mu_*(\{S\ \text{is connected}\}) = 0$ and $\mu^*(\{S\ \text{is connected}\}) = 1$. Therefore:

the set $\{S\ \text{is connected}\}$ is not measurable in the infinite product $(\Omega,\mathscr{A},\mu)$ or in its completion $(\Omega,\mathscr{A}^\mu,\mu)$;
by this answer, for every $\xi \in [0,1]$ there exists an extension $(\Omega,\mathscr{A}',\mu')$ of $(\Omega,\mathscr{A}^\mu,\mu)$ in which $\{S\ \text{is connected}\}$ is measurable and has probability $\xi$.

This shows that the problem is underspecified. In other words, we need to specify more than just the individual point probabilities if we want the probability of $\{S \ \text{is connected}\}$ to be uniquely defined. I suggest you try to find another model for your problem, for instance by requiring some kind of invariance (as in Jackson's answer) instead of individual point probabilities. In general, it seems that uncountable products of probability spaces only allow us to say things about events which depend only on a countable amount of data.

Closing remarks:

I had initially suspected that the probability would be $0$ or $1$, due to Kolmogorov's zero-one law (thanks to Trebor's comment to the original question). Indeed, the probability does not depend on any finite portion of the data, so it looks like a tail event. Therefore I actually expected to find that either $\{S\ \text{is connected}\}$ or its complement would be a null set (that is, contained in a measurable set with probability $0$), even if it would not be measurable in its own right. My answer shows that this is not the case. I guess this shows that zero-one laws only apply to tail events, and not to tail non-events.
Note that my answer remains valid (and reaches the same conclusion) if we change the individual point probabilities. We can give each point a different probability of being included (independently of the other points), and these probabilities can be anything. Even if we specify that all points occur with probability $0$, the probability of $S$ being connected can still be arbitrary! The problem remains underspecified, and my solution still constructs a measure space where every point occurs with probability $0$ but the event $\{S\ \text{is connected}\}$ has arbitrary probability. (Very strange!) The individual point probabilities simply do not say anything about events which depend on an uncountable amount of data.
Likewise, every part of my solution is still valid if we replace “connected” by “path connected”.
There used to be a remark here about my intuitive beliefs of what the answer should be, but I removed because I think it was missing the point. The key takeaway is that an uncountable product of probability spaces is not a good model for this type of problem.
The key takeaway from the preceding remark makes it seem all to more incredible to me that continuous time stochastic processes, such as the Wiener process (i.e. Brownian motion), are well-defined. Maybe the problem can be modelled as a stochastic process somehow? We should probably look into continuum percolation.

References.

[Bog07] V.I. Bogachev, Measure Theory, Volume I, Springer, 2007.

I'll take a bit of a different approach to Mark's answer, using group invariance on the measure space. I think it's obvious that the probability of producing a connected set should be $0$, so I'll argue that there is at least one way of making $\mathcal{P}(\mathbb R^2)$ a probability measure space such that every point independently has an equal probability of being included. Unfortunately this approach doesn't clarify whether each point's probability is 50%.

Consider the group $\mathop{Sym}(\mathbb{R}^2)$ of self-bijections $\mathbb{R}^2 \to \mathbb R^2$. We can phrase the requirement that each point's inclusion is equally likely and independent of all other points in terms of the action of $\mathop{Sym}(\mathbb{R}^2)$ on $\mathcal{P}(\mathbb{R}^2)$ and $\mathcal{P}(\mathcal{P}(\mathbb{R}^2))$. That is, if $\phi$ is a bijection on $\mathbb{R}^2$, then $\phi$ acts on $\mathcal{P}(\mathbb{R}^2)$ via $\phi(A) = \{\phi(x) \mid x \in A\}$. Furthermore, given any $\mathfrak{A} \subset \mathcal{P}(\mathbb{R}^2)$, we can let $\phi(\mathfrak{A}) = \{\phi(A) \mid A \in \mathfrak{A}\}$.

What does it mean for each point to have equal and independent probability of inclusion? It means that if we have a probability measure space $(\mathcal{P}(\mathbb{R}^2), \Sigma, \mu)$, then for all $\mathfrak{A} \in \Sigma$, and any bijection $\phi \in \mathop{Sym}(\mathbb{R}^2)$, we have $\phi(\mathfrak{A}) \in \Sigma$, and $\mu(\phi(\mathfrak{A})) = \mu(\mathfrak{A})$. So the question is, if $\mathfrak{C} = \{A \subset \mathbb{R}^2 \mid A \text{ is connected}\}$, is there a $\mathop{Sym}(\mathbb R^2)$-invariant probability measure on $\mathcal P (\mathbb R^2)$ such that $\mathfrak{C}$ is measurable? If so, what is $\mu(\mathfrak C)$?

For any cardinality $\kappa < 2^{2^{\aleph_0}}$, the co-$\kappa$ probability measure on $\mathcal{P}(\mathbb{R}^2)$ is $\mathop{Sym}(\mathbb{R}^2)$-invariant (in fact it's $\mathop{Sym}(\mathcal P(\mathbb R^2))$-invariant), so such measures certainly exist.

And there are as many disconnected as connected subsets of $\mathbb{R}^2$. That is, $$ |\mathfrak{C}| = |\mathcal{P}(\mathbb{R}^2) \setminus \mathfrak{C}| = |\mathcal P(\mathbb R)| = 2^{2^{\aleph_0}}. $$ Indeed, let $A \subset \mathbb{R}$. For all such $A$, we can construct a distinct connected $C_A \subset \mathbb{R}^2$ and disconnected $D_A \subset \mathbb R^2$. Given $A \subset \mathbb{R}$, let $A'$ be the subset of $\mathbb R$ where all nonnegative elements are shifted up by $1$: $$ A' = (A \cap (-\infty, 0)) \cup ((A \cap [0, \infty)) + 1). $$ The $A \mapsto A'$ is an injective map $\mathcal P(\mathbb R) \to \mathcal P(\mathbb R)$. Now let $E_A = \{(x, y) \mid x \in A'\}$. Then $C_A := E_A \cup \{(\frac{1}{2}, \frac{1}{2}), (\frac{1}{2}, -\frac{1}{2})\}$ is disconnected, and $D_A := E_A \cup \{(x, y) \mid y=0\}$ is connected.

So $\mathfrak{C}$ is not co-$\kappa$ for any $\kappa < 2^{2^{\aleph_0}}$. In particular, the co-$2^{\aleph_0}$ probability measure is $\mathop{Sym}(\mathbb{R}^2)$-invariant and finds that $\mu(\mathfrak{C}) = 0$*.

Unfortunately, I doubt if if there are any $\mathop{Sym}(\mathbb{R}^2)$-invariant probability measures on $\mathcal{P}(\mathbb{R}^2)$ besides co-$\kappa$ measures. Besides, $\mathop{Sym}(\mathbb R^2)$-invariance is ridiculously strong requirement—for instance, I believe the Lebesgue completion of a $\mathop{Sym}(X)$-invariant measure on $\mathcal P (X)$ always has $\mathcal P(\mathcal P(X))$ as its $\sigma$-algebra. So I am very sympathetic to the answer "$\mathfrak C$ should not be measurable." But there is at least one way it can be measurable, and in that way its measure is $0$.

Edit: I believe I missed a detail about the co-$\kappa$ measure. I was assuming that for $A \subset \mathcal P (\mathbb R^2)$, if $A$ is not co-$\kappa$, then $\mu(A) = 0$. But this is not a measure at all, as any partition of $\mathcal P (\mathbb R^2)$ into $2$ disjoint and equal-cardinality subsets violates the disjoint-sum axiom of a measure. To fix this, we can only consider sets that are measurable in the co-$\kappa$ topology's Borel sigma algebra. Then $\mathfrak C$ is unmeasurable, and $\mu(\mathfrak C)$ is undefined.

Is a random subset of $\mathbb{R}^2$ connected?

Related

Recent Posts