On Covering $\{1,\cdots,n\}$ with Unions of Small Subsets

I've been stuck on the following problem for some time already. This is a problem that arose while I was studying some properties of boolean circuits (this is just for the anecdote, it should not matter here). I hope it's neither an entirely trivial question (as I've been unable to solve it myself) nor a too hard one.

Let $(n,k)$ be two integers, with $k \ll n$ (in the particular case I'm interested in, $n$ is a power of $2$ and $k = \log_2{n}$). Suppose that we are given $n$ subsets of $[n] = \{1, \cdots, n\}$, denoted $(S_1, \cdots, S_n)$, such that:

  • $\bigcup_{i=1}^n S_i = [n]$
  • For $i=1$ to $n$, $|S_i| \leq k$ (where $|\cdot|$ denotes the size of the set)

Is it possible to find $m \in O(n/k)$ subsets $(S'_1, \cdots, S'_m)$ of $[n]$ such that:

  • For $i=1$ to $m$, there is $J \subseteq [n]$ such that $S'_i= \bigcup_{j \in J} S_j$
  • $\bigcup_{i=1}^m S'_i = [n]$
  • For $i=1$ to $m$, $|S'_i| \in O(k)$

In other (intuitive) words, I have "too many subsets" ($n$), each subset being "small" (less than $k$ elements), which entirely cover $[n]$. Can I "regroup" them, using unions, into a smaller number ($m \in O(n/k)$) of subsets so that each new subset remains "not too big" ($O(k)$), but still the new subsets entirely cover $[n]$?

I've failed to get even a good intuition of the result I should expect. Although I've made my best to formulate it as a well-defined question, I'm in fact interested about whatever interesting can be said about it, especially in the setting $k = \log_2{n}$. If, for example, the answer is no and some counter example is found, then I'd be interested in knowing whether there is some natural characterization of the sets $S_i$ for which sets $S'_i$ satisfying the above conditions can be found, or if one can get a yes answer by relaxing some bound (like, $m \in O((n\log\log n)/k)$, or $|S'_i| \in O(k\log k)$). More generally, if the question appears to be non-trivial, I'd be satisfied by an answer providing interesting insights even if it does not fully answer the question. Even relevant references would definitely help.

EDIT: The problem shares many similarities with the set cover problem and the maximum coverage problem. However, these problems are concerned with the optimality of the solution, while I'm only asking for the existence of some (possibly non-optimal) solution that satisfy the given constraints. I'm currently trying to get some intuition of the result by looking at whether uniformly random subsets of $[n]$ of size $k$ would satisfy (with some reasonable probability) the constraints.


My answer is very partial, but I decided that better this than nothing.

While thinking on your question, I was becoming more and more pessimistic, from an assertion “it is a simple combinatorial lemma” to a question “how can we obtain non-trivial advances?”. What means “non-trivial”, if even the relaxed question is actual? But we have a balance between $m$ and $|S_j’|$, that is grouping the family $\mathcal S=\{S_i\}$ into arbitrary groups consisting of $\Theta(k^\alpha)$ subsets we obtain $|S_j’|=O(k^{1+\alpha})$ and $m=\Theta(n/k^{\alpha}) $. So, in particular, we have a question can we improve an upper bound $|S_j’|=O(k^2)$ for $m=n/k$?

I started from the case $k=2$. We can proceed as follows. Let $\mathcal T$ be a maximal mutually disjoint subfamily of the family $\mathcal S$. The maximality of the family $\mathcal T$ implies that for each $S\in \mathcal S\setminus\mathcal T$ we can pick a set $f(S)\in\mathcal T$ such that $S\cap f(S)\ne\varnothing$. Fix an arbitrary enumeration of the family $\mathcal T$ and inductively for each $j\in [|\mathcal T|]$ put $\mathcal S_j=\{T_j\}\cup \{S\in\mathcal S\setminus\mathcal T: f(S)=T_j\}\setminus \bigcup_{j’=1}^{j-1}\mathcal S_{j’}$, $P_j=\bigcup\mathcal S_j$, and $n_j=| P_j |$. Then the sets $P_j$ are mutually disjoint and $[n]=\bigcup P_j$, so $\sum n_j=n$. Let $j$ be an arbitrary index. Since each element of the family $\mathcal S_j$ contains a points of a two-point set $T_j$, we can easily construct a cover $\mathcal C_j$ of the set $P_j$ by at most $n_j/2$ subsets, such that each of them is a union of at most two subsets from the family $\mathcal S_j$ and has size at most $3$. Taking a union of all covers $\mathcal C_j$ we obtain a cover of the set $[n]$ by $m\le\sum n_j/2=n/2$ subsets of size at most $3$. Unfortunately, proceeding by induction by $k$ from this point I can obtain, even in the luckiest case, a bound $|S_j’|\le k^2/2+O(k)$.

So I started to think about a counterexample. But it is hard for me to tell, how it would look and how to check it. I only have the following idea, which may yield (or not) some advance. Let $k\le \log_2 n$. Instead of subsets of $[n]$ I’ll consider the subsets of the group $\Bbb Z_n=\Bbb Z/n\Bbb Z$. Let $A=\{0,1,2,\dots, 2^{k-1}\}$ and for each $i\in\Bbb Z_n$ let $S_i=i+A$.

PS. I shared your question with some my coauthors, who may be interested in it and maybe together we'll be able to obtain some advances.