exercise in Isaacs's book on Character Theory
By the hint, as explained in Jack Schmidt's answer, you see that every conjugacy class has size smaller than f. Consider the conjugation action of G on the conjugacy class $g^G$. This is the sum of the trivial with a representation of dimension smaller than f. Thus, by assumption it is a sum of 1-dimensional representations, and therefore abelian. This means that $x y g y^{-1} x^{-1} = y x g x^{-1} y^{-1}$. Rearranging we get, $x^{-1} y^{-1} x y g = g x^{-1} y^{-1} x y$, and thus every commutator is central.
The reason this has to do with induced representations is that the conjugation action on $g^G$ is the representation induced from the trivial rep of the centralizer of g.
Since no one has answered, I'll write down what I've had time to do:
The hint is proved as you say, but the injection is not just the identity, so I thought I'd mention it: If x and y are conjugate in G, then y = xg for some g in G, and so we get an element x−1y = [x, g] in G′. For a fixed x, we can recover y from c = x−1y as y = xc, and so the map from xG to G′ given by y maps to x−1y is injective, and |xG| ≤ |G′|. One does not generally get equality, since G′ consists of more than just commutators.
A few stronger claims are not true: S3 × S3 has a normal subgroup A3 × 1 whose size is less than the degree of a centerless character. The extra-special groups of order 32 have minimal non-linear character degree of 4, but have many non-central normal subgroups of order 4 (their derived subgroup and center have order 2). Thus it is important that the normal subgroup is contained in the derived subgroup.
The following seems to lead nowhere:
If |G′| ≤ f, then each conjugacy class size is less than f as well.
We want to show that if c in G′ and χ in Irr(G), then |χ(c)| = χ(1) ≥ f.
Column orthogonality gives: $$0 = \sum_{\chi \in Irr(G)} \chi(c)\chi(1) = [G:G'] + \sum_{\chi \in Irr(G) - Irr(G/G')}\chi(c)\chi(1)$$ Presumably now we take absolute values to finish, except I don't see anything useful. From column orthogonality, we also have $$\frac{|G|}{f} \leq |C_G(g)| = \sum_{\chi} |\chi(g)|^2 = [G:G'] + \sum_{\chi \in Irr(G)-Irr(G/G')} |\chi(g)|^2$$
A posteriori, we know that for every irreducible χ and g in G, |χ(g)| in { 0, χ(1) }, so we should get a lot of vanishing, but all of the inequalities I derived pointed the wrong way.
Inducing from the derived subgroup seems like a bad idea, since it is so small. Inducing from a centralizer might be reasonable, since its index is small, but I didn't see anything that actually helped.