Solution 1:

The equivalence between the two conditions is purely formal, it holds in every category of algebraic structures.

If $u$ is a non-generator and $H$ is a maximal subgroup of $G$ (by which we mean of course a maximal proper subgroup), then $\langle H \rangle = H \neq G$, hence $\langle H,u \rangle \neq G$, which implies $H = \langle H,u \rangle$ since $H$ is maximal, and therefore $u \in H$. Hence, $u$ lies in every maximal subgroup.

If $u$ is not a non-generator, choose some $X \subseteq G$ with $\langle X \rangle \neq G$ but $\langle X,u \rangle = G$. By Zorn's Lemma there is a subgroup $H$ which is maximal with the property that it contains $X$, but does not contain $u$. In fact, $\langle X \rangle$ is such a subgroup, and if $\cal C$ is a non-empty chain of such subgroups, then one can easily check that $\cup \cal C$ is a subgroup with this property. Observe that $H$ is maximal: If $K$ is a subgroup containing $H$ properly, we must have $u \in K$ and $X \subseteq K$, hence $K=G$. Hence, $H$ is a maximal subgroup not containing $u$.

Remark: Not every subgroup of a group can be enlarged to a maximal subgroup. In fact, there are groups (such as $\mathbb{Q}$) with no maximal subgroups at all. Therefore the proof is somewhat clumsy, but it works.

More generally, if $G$ is any algebraic structure, then the intersection of all proper substructures of $G$ is called the radical of $G$, and by the proof above it coincides with the set of all non-generators of $G$. If $G$ is a group, we get the Frattini subgroup. If $G$ is a left module over a ring $R$, we get its radical, which in the particular case of $G=R$ is known as the Jacobson radical. So the Frattini subgroup is really just a special case of a more general construction, whose special cases one might be familiar with.

Probably the best way to get familiar with the Frattini subgroup is to learn some of its nice properties. It is always a characteristic subgroup. If $G$ is a finite group, then $\Phi(G)$ is nilpotent. If $G$ is a finite $p$-group, then $\Phi(G)$ is the smallest normal subgroup whose quotient is elementary abelian. In this situation, Burnside's Basis Theorem states that a subset generates $G$ if and only if its image generates the $\mathbb{F}_p$-vector space $G/\Phi(G)$, which reduces the former condition to linear algebra.

Solution 2:

Take the cyclic group $\,G:=\langle\,x\,\rangle\,$ of order $\,p^2\,\,\,,\,\,p\,$ a prime. Then $\,\Phi(G)=\langle\,x^p\,\rangle\,$ (it's easy to check this taking $\,G\,$ as a vector space over $\,\Bbb F_p=\Bbb Z/p\Bbb Z\,$ of dimension $\,2\,$).

We know that the generators of $\,G\,$ are the elements $\,x^i\,$ , with $\,(i,p)=1\iff p\nmid i\,$ , and thus all the elements of the form $\,x^{kp}\,\,,\,\,k\in\Bbb Z\,$ , are the ones that cannot generated $\,G\,$, i.e. the non-generators.

Finally, if you know the proof of the relation between the Frattini subgroup and the set of non-generators, there you can see that an element that belongs to all the maximal subgroups of $\,G\,$ has to be a non-generator as otherwise it together with some other subset would generate the whole group without being possible to drop this element from the whole generating set, and from here one can construct a maximal subgroup that won't contain that element (Zorn Lemma's calling in the general, non-finite case)...

Solution 3:

What helped me become friendly with the Frattini subgroup, in the context of groups of prime-power order, was a related problem:

Let $p$ be a prime, and let $G$ be a $p$-group, with $|G|=p^{n}$. Show that $|Aut(G)| \mid \Pi_{k=0}^{n-1}{(p^{n}-p^{k})}$.

This result is sharp in the sense that, if $G$ is elementary abelian, $Aut(G)$ will be a general linear group whose order is exactly that product. So the Frattini subgroup is a way of getting a handle on $p$-groups of an arbitrary nature using the way elementary abelian $p$-groups work.
As has been mentioned, $\dim_{\mathbb{Z}/(p)} G/\Phi(G)$ is the smallest number of elements needed to generate $G$. This is one measure of the complexity of $G$. If it is $1$, $G$ is cyclic. If it is low, obtaining a presentation for $G$ should not be too difficult. If it is high, $G$ needs many generators and there can be many ways they satisfy their relations, and $G$ can be hard to describe. (Of course, when this is $n$, $G$ is elementary abelian and that's not hard to describe.)
Solving the problem I alluded to also brings out something else, which is the order of a Sylow $p$-subgroup of $Aut(G)$. When the minimum size of a generating set of $G$ is $1$, this is $p^{n-1}$. (In fact, this group will be trivial or cyclic, except when $p=2$ and $n \geq 3$.) If $G$ needs $r$ elements to be generated, then the order of a Sylow $p$-subgroup of $Aut(G)$ will divide (equal? I don't know enough about $p$-groups to answer this question) $p^{n-1+ \ldots + n-r} = p^{nr - \frac{r^{2}+r}{2}}$.

One more remark about being the group of non-generators: if and only if $H < G$ is a subgroup and the only subgroup $K \leq G$ such that $HK = G$ (so $K$ straddles all the cosets of $H$, with some unspecified multiplicity) is $K = G$ itself, then $H \leq \Phi(G)$. To see this, try choosing $K$ to be a maximal subgroup of $G$ and remember that since $G$ is a $p$-group, any maximal subgroup of $G$ is normal.

Solution 4:

(1) In my opinion, the natural intuition for Frattini subgroup, will be not through non-generators, but its corresponding concept in ring theory: Jacobson radical, being intersection of all the maximal ideals. I would say then

Definition: Frattini subgroup is the intersection of all the maximal subgroups.

This definition is very natural to understand as will as explain others or introduce in lecture for beginners, or anyone. It also immediately says that Frattini subgroup is normal or even characteristic.

(2) Of course, when $X$ is a generating set for $G$, then every $g\in G$ is a product of finitely many elements of $X$ (by definition of generating set), so this also holds for $g\in\Phi(G)$. (I didn't see what is exact problem on these lines in question).

(3) The equivalent definitions can be found in many texts on group theory; and I didn't found any intuition for these equivalence (in books or even myself). The definition via non-generators has advantage in proving that Frattini subgroup is nilpotent (for finite groups); the argument in its proof exactly the Frattini argument.

So the definition through maximal subgroups is most natural one; and certainly, each equivalent definition has some advantage.