Why is "the set of all sets" a paradox, in layman's terms?
Let $|S|$ be the cardinality of $S$. We know that $|S| < |2^S|$, which can be proven with generalized Cantor's diagonal argument.
Theorem
The set of all sets does not exist.
Proof
Let $S$ be the set of all sets, then $|S| < |2^S|$. But $2^S$ is a subset of $S$. Therefore $|2^S| \leq |S|$. A contradiction. Therefore the set of all sets does not exist.
Just by itself the notion of a universal set is not paradoxical.
It becomes paradoxical when you add the assumption that whenever $\varphi(x)$ is a formula, and $A$ is a preexisting set, then $\{x\in A\mid \varphi(x)\}$ is a set as well.
This is known as bounded comprehension, or separation. The full notion of comprehension was shown to be inconsistent by Russell's paradox. But this version is not so strikingly paradoxical. It is part of many of the modern axiomatizations of set theory, which have yet to be shown inconsistent.
We can show that assuming separation holds, the proof of the Russell paradox really translates to the following thing: If $A$ is a set, then there is a subset of $A$ which is not an element of $A$.
In the presence of a universal set this leads to an outright contradiction, because this subset should be an element of the set of all sets, but it cannot be.
But we may choose to restrict the formulas which can be used in this schema of axioms. Namely, we can say "not every formula should define a subset!", and that's okay. Quine defined a set theory called New Foundations, in which we limit these formulas in a way which allows a universal set to exist. Making it consistent to have the set of all sets, if we agree to restrict other parts of our set theory.
The problem is that the restrictions given by Quine are much harder to work with naively and intuitively. So we prefer to keep the full bounded comprehension schema, in which case the set of all set cannot exist for the reasons above.
While we are at it, perhaps it should be mentioned that the Cantor paradox, the fact that the power set of a universal set must be strictly larger, also fails in Quine's New Foundation for the same reasons. The proof of Cantor's theorem that the power set is strictly larger simply does not go through without using "forbidden" formulas in the process.
Not to mention that the Cantor paradox fails if we do not assume the power set axiom, namely it might be that not all sets have a power set. So if the universal set does not have a power set, there is no problem in terms of cardinality.
But again, we are taught from the start that these properties should hold for sets, and therefore they seem very natural to us. So the notion of a universal set is paradoxical for us, for that very reason. We are educated with a bias against universal sets. If you were taught that not all sets should have a power set, or that not all sub-collections of a set which are defined by a formula are sets themselves, then neither solution would be problematic. And maybe even you'd find it strange to think of a set theory without a universal set!