Smallest diameter of a balanced subset of the Hamming cube

Let $\{0,1\}^n$ be the Hamming cube with the Hamming metric. It's a metric space of diameter $n$. Let's call a set $B\subset \{0,1\}^n$ balanced if its center of mass is the center of the cube; that is, the average of all vectors contained in $B$ is $(1/2,\dots,1/2)$. What is the smallest possible diameter of a nonempty balanced subset of $\{0,1\}^n$?


Partial results

Let $d_n$ be the aforementioned smallest diameter, and let $B$ be a balanced set that realizes it. Without loss of generality, $B$ contains the zero vector $(0,\dots,0)$. To be balanced, it must also contain some vector with more $1$s than $0$s. Hence $$ d_n \ge \left\lceil \frac{n+1}{2} \right\rceil \tag{1} $$ The Cartesian product of two balanced sets is balanced, and its diameter is the sum of diameters of its factors. Therefore, the sequence $(d_n)$ is subadditive: $$ d_{m+n} \le d_m+d_n \tag2 $$ (which in particular implies that $d_n/n$ has a limit.)

The values of $d_n$ I know so far: $$ \begin{array}{|c|c|} \hline n & d_n \\ \hline 1 & 1 \\ 2 & 2 \\ 3 & 2 \\ 4 & 3 \\ 5 & 4 \\ 6 & 4 \\ \hline \end{array} $$

Most of the table is obtained from $(1)$-$(2)$ with the help of the example $$ \begin{pmatrix} 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 \end{pmatrix} $$ where the columns form a balanced set of diameter $2$, showing $d_3\le 2$. (By the way, this is a tetrahedron inscribed in a cube.)

For $d_5$, the bounds $(1)$-$(2)$ give $3\le d_5\le 4$. To see that the diameter cannot be $3$, note that WLOG a balanced set of diameter $3$ contains the following columns: $$ \begin{pmatrix} 0 & 1 \\ 0 & 1 \\ 0 & 1 \\ 0 & 0 \\ 0 & 0 \end{pmatrix} $$ Since the total number of $0$s and $1$s in the last two rows must be the same, we need a column ending with two $1$s: $$ \begin{pmatrix} 0 & 1 & * \\ 0 & 1 & * \\ 0 & 1 & *\\ 0 & 0 & 1 \\ 0 & 0 & 1 \end{pmatrix} $$ but no matter how the asterisks are filled, the diameter is greater than $3$.


Solution 1:

(This would probably be better as a comment than an answer but it looks like I can't comment yet.)

I think I can slightly improve on your partial results, with $d_7=4$ and (hence) $d_8=5$. (Since whenever $d_{2k-1} = k$ your bounds $(1)$-$(2)$ give $d_{2k}=k+1$.)

For the first claim, it's enough to find a balanced set with diameter equal to $4$, and it appears that $$\left\{ \begin{pmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{pmatrix} \begin{pmatrix} 1 \\ 1 \\ 1 \\ 1 \\ 0 \\ 0 \\ 0 \end{pmatrix} \begin{pmatrix} 1 \\ 1 \\ 0 \\ 0 \\ 1 \\ 1 \\ 0 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \\ 1 \\ 0 \\ 1 \\ 0 \\ 1 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \\ 0 \\ 1 \\ 0 \\ 1 \\ 1 \end{pmatrix} \begin{pmatrix} 0 \\ 1 \\ 1 \\ 0 \\ 0 \\ 1 \\ 1 \end{pmatrix} \begin{pmatrix} 0 \\ 1 \\ 0 \\ 1 \\ 1 \\ 0 \\ 1 \end{pmatrix} \begin{pmatrix} 0 \\ 0 \\ 1 \\ 1 \\ 1 \\ 1 \\ 0 \end{pmatrix} \right\}$$ is exactly such a set.

Solution 2:

This is the streamlined (incomplete) answer resuming all results in my other answer (and including some results obtained by Michelle). I don't think the general problem will be settled in the short term.

First we give improve the lower bound for $d_n\ge \lceil\frac{n+1}{2}\rceil$ given in (1) in the OP:

$\bf{Proposition\ 1:(III)}$ $d_{4k+1}\ge 2k+2$ for $k>0$.

So we have a lower bound that can be described modulo 4:

$n=4k$, $d_n\ge b_{4k}=2k+1$ for $k>0$.

$n=4k+1$, $d_n\ge b_{4k+1}=2k+2$

$n=4k+2$, $d_n\ge b_{4k+2}=2k+2$

$n=4k+3$, $d_n\ge b_{4k+3}=2k+2$

We obtain the following table for the lower bounds: $$ \begin{array}{|c|c|} \hline n & b_n \\ \hline 1 & 1 \\ 2 & 2 \\ 3 & 2 \\ 4 & 3 \\ 5 & 4 \\ 6 & 4 \\ 7 & 4 \\ 8 & 5 \\ 9 & 6 \\ 10 & 6 \\ 11 & 6 \\ 12 & 7 \\ 13 & 8 \\ 14 & 8 \\ 15 & 8 \\ 16 & 9 \\ 17 & 10 \\ 18 & 10 \\ 19 & 10 \\ 20 & 11 \\ 21 & 12 \\ 22 & 12 \\ 23 & 12 \\ 24 & 13 \\ 25 & 14 \\ 26 & 14 \\ \vdots&\vdots\\ \hline \end{array} $$

$\bf{Proposition\ 2:}$ If for some $j$ we have $d_{4j+3}=b_{4j+3}=2j+2$, then $d_n=b_n$ for $n\in\{4j+1,4j+2,4j+4,4j+5,8j+5,8j+6\}$.

The proof follows directly from the lower bounds and (2) in the OP: $d_{n+m}\le d_n+d_m$.

In view of this proposition, in order to establish $d_n=b_n$ for all $n$, it suffices to prove it for $n=4j+3$ for all $j$.

We will prove the equality only for $4j+3=p$ a prime, $4j+3=p(p+2)$ a product of twin primes or $4j+3=2^t-1$ for some $t$. This proves for example that $d_n=b_n$ for all $n\le 50$ except $n\in\{27,28,39,40\}$.

We will describe a construction of balanced sets of vectors of length $4j+3$ of diameter $2j+2$ out of a cyclic $(v,k,\lambda)$ difference set with parameters $(v,k,\lambda)=(4j+3,2j+1,j)$, thus proving that whenever such a difference set exists, we have $d_{4j+3}=b_{4j+3}=2j+2$. It is known that such cyclic sets exist for the three cases mentioned above (See https://oeis.org/A217332 ). For the other cases of $n=4j+3$ it is known that no such cyclic difference sets exist for $n\le 10000$, but that doesn't imply that there are no balanced sets of diameter $2j+2$, so these cases remain open.

${\bf{Definition}}$ A cyclic $(v,k,\lambda)$ difference set $D$ is a set of $k$ residues modulo $v$ such that for each non-zero residue $s\mod v$ the equation $x-y\equiv s \mod v$ has exactly $\lambda$ solution pairs $(x,y)$, with $x,y\in D$. In particular cyclic Hadamard difference sets have the parameters $v=4j+3$, $k=2j+1$ and $\lambda=j$ for a non-negative integer $j$.

Now define vectors $V_0,\dots,V_{v-1}\in\{0,1\}^v$ by $$ (V_s)_t=\left\{\begin{array}{cc} 0&\text{ if $t-s\in D$}\\ 1&\text{ if $t-s\notin D$} \end{array} \right. $$

${\bf{Claim:}}$ The set $\{\vec{0},V_0,\dots,V_{v-1}\}$ is a balanced set of vectors of diameter $2j+2$.

It is straightforward to check that the set is balanced. Moreover, $$ d(\vec{0},V_s)=\#\{t, \ (V_s)_t=1\}=v-\# D=4j+3-( 2j+1)=2j+2, $$ and $d(V_s,V_i)=d(V_0,V_{|s-i|})$, so it suffices to prove that $d(V_0,V_s)=2j+2$ for all $s\ne 0$.

Now fix $s$ and note that $$ d(V_0,V_s)=\#\{t, \ (V_0)_t\ne (V_s)_t\} $$ $$=\quad\#\{t,\ (V_0)_t=0\text{ and }(V_s)_t=1\}\quad +\quad \#\{t,\ (V_0)_t=1\text{ and }(V_s)_t=0\}. $$ We define $$ A_{10}=\{t,\ (V_0)_t=1\text{ and }(V_s)_t=0\} $$ $$ A_{01}=\{t,\ (V_0)_t=0\text{ and }(V_s)_t=1\} $$ $$ A_{00}=\{t,\ (V_0)_t=0\text{ and }(V_s)_t=0\} $$ $$ A_{11}=\{t,\ (V_0)_t=1\text{ and }(V_s)_t=1\}. $$ Then $$ \# A_{10}+\# A_{00}=\#D=2j+1\quad\text{and}\quad \# A_{01}+\# A_{00}=\#D=2j+1, $$ and so $$ \# A_{10}=\# A_{01}=2j+1-\# A_{00}. $$ But $$ A_{00}=\{t,\ (V_0)_t=0\text{ and }(V_s)_t=0\}=\{t,\ t\in D\text{ and }t-s\in D\}, $$ and so $$ \# A_{00}=\#\{t, \ (t,t-s)\in D^2 \}=\{(x,y)\in D^2,\ x-y=s\} $$

But, by definition, the equation $x-y\equiv s \mod v$ has exactly $\lambda=j$ solution pairs $(x,y)$, with $x,y\in D$, hence $\# A_{00}=j$ and so $$ d(V_0,V_s)=\# A_{10}+\# A_{01}=2(2j+1-\# A_{00})=2j+2, $$ as desired.

We can say something about $\lim_{n\to\infty} \frac {d_n}{n}$. By Fekete's subadditivity lemma and (2) in the OP, the limit $\lim_{n→∞} \frac{d_n}n$ exists. So the fact that there are infinitely many numbers for which the lower bound is attained (e.g., $2^t−1$) implies $\lim_{n→∞} \frac{d_n}n =\frac 12$.

Finally assume the following variant of the Goldbach conjecture is true:

Every number of the form $4k+2$ is the sum of two primes $p,q$ of the form $4k+3$.

We know that $d_p=\frac{p+1}2$ and $d_q=\frac{q+1}2$. Hence, if this conjecture is true, the lower bound is attained at every number of the form $4k+2$ and at every number of the form $4k+1$.