Probability of majority votes being correct

Given a binomial distribution with $n$ expeiments, where probability of success is $p$. $\text{prob}(x=k) = \binom{n}{k} p^k (1-p)^{n-k}$

I'm trying to show that for odd values of $n\geq 3$, $p > .5$, we always have:

$$ \text{prob}(x > n/2) > p $$

where $\text{prob}(x > n/2) = \sum_{k=\lfloor n/2 \rfloor+ 1}^n \binom{n}{k} p^k (1-p)^{n-k}$

What I tried: I tried to expand $(p + q)^n = 1$ where $q=1-p$, and argue the sum second half of the terms is greater than $p$. But other than the trivial case of $p=.5$, seems like my argument is not really useful.

I appreciate any help to prove or disprove this claim.


Solution 1:

I will slightly adapt notation for convenience. Denote $X_i \in \{-1,1\}$ i.i.d. Bernoulli random variables with $P(X_i=1) = p$ and $P(X_i=-1) = 1-p$, where $0.5<p<1$ and $S_n = \sum_{i=1}^nX_i$. Then we are proving $P(S_{2m+1}\geq 1)>p$ for all $m>0$.

To start with, notice that $S_{2m+1}$ only assumes odd integer values and $S_{2m+1}=S_{2m-1}+X_{2m}+X_{2m+1}$. Therefore, $S_{2m+1}\geq1$ precisely if either $S_{2m-1}>1$, or $S_{2m-1}=-1\land X_{2m}=X_{2m+1}=1$, or $S_{2m-1}=1\land X_{2m}+X_{2m+1}\geq0$. With this insight, we deduce the following recursive expressions \begin{align} P(S_{2m+1}\geq 1) &=P(S_{2m-1} >1) + P(S_{2m-1} = -1)p^2 + P(S_{2m-1} = 1)(1-(1-p)^2) \\ &=P(S_{2m-1} \geq 1) + P(S_{2m-1} = -1)p^2 - P(S_{2m-1} = 1)(1-p)^2, \end{align} where we used $P(S_{2m-1} >1)+P(S_{2m-1} =1)=P(S_{2m-1} \geq1)$. Inserting $P(S_{2m-1} = 1)(1-p) = P(S_{2m-1} = -1)p$ (which can be verified via evaluating the binomial distribution), we obtain $$ P(S_{2m+1}\geq 1) = P(S_{2m-1}\geq 1) + P(S_{2m-1} = 1) (2p-1)(1-p) > P(S_{2m-1} \geq 1). $$ Since $P(S_1 \geq 1) = p$, the claim follows.

Solution 2:

Fix $p>\frac{1}{2}$, we have to show $P\left(X>\frac{n}{2}\right)-p>0$, for all odd $n$ values, s.t., $X \sim B(n,p)$.

Proof by induction on $m \in \mathbb{Z^+}$, where $n=2m+1$ may go like this (can the induction step be completed from here?):

Basis: $m=1$ (i.e., $n=3$), $X \sim B(3,p)$, $P\left(X>\frac{n}{2}\right)-p={3 \choose 2}p^2(1-p)+p^3=3p^2+2p^3-p$. Now, let's observe that $f(p)=3p^2+2p^3$ is increasing in $p$ for $p \geq 0$, Hence, for $p>\frac{1}{2}$, we have, $f(p)>f(1/2)=1 \implies P\left(X>\frac{n}{2}\right)-p = f(p)-p>1-p\geq 0$.

Hypothesis: Let $m=r-1$, i.e., $n=2r-1$, $X \sim B(2r-1,p)$ and $P\left(X>\frac{n}{2}\right)-p = \sum\limits_{k=\lfloor n/2 \rfloor+ 1}^n \binom{n}{k} p^k (1-p)^{n-k}-p = \sum\limits_{k=r}^{2r-1} \binom{2r-1}{k} p^k (1-p)^{2r-1-k}-p>0$

Induction Step: For $m=r$, i.e., $n=2r+1$, we have $X \sim B(2r+1,p)$. Now we need to show $P\left(X>\frac{n}{2}\right)-p = \sum\limits_{k=r+1}^{2r+1} \binom{2r+1}{k} p^k (1-p)^{2r+1-k}-p>0$.

Let $g(r)= \sum\limits_{k=r}^{2r-1} \binom{2r-1}{k} p^k (1-p)^{2r-1-k}$. To prove the induction step, it suffices to show that $g(r+1)>g(r),\;\forall{r}\geq 1$, when we have $p>\frac{1}{2}$, which we can see below numerically with the following R code (it seems to be independent of $p$ as long as we have $p>\frac{1}{2}$):

n <- seq(3,201,2)
plot(n, pbinom(n/2, n, 0.51, lower.tail=FALSE), type='l')
for (p in seq(0.6,1,0.1)) {
  lines(n, pbinom(n/2, n, p, lower.tail=FALSE))
}

enter image description here

Now, $g(r+1)-g(r)=\sum\limits_{k=r+1}^{2r+1} \binom{2r+1}{k} p^k (1-p)^{2r+1-k}-\sum\limits_{k=r}^{2r-1} \binom{2r-1}{k} p^k (1-p)^{2r-1-k}$, it boils down to the inequality that $g(r+1)-g(r)>0$ (that we have to prove), the meaning of this is intuitive though, probability obtained with majority voting should increase as the number of experiments ($r$) increases.