Why can I choose to work in a strict monoidal category without loss of generality?
As Zhen Lin noted, the coherence theorem is basically equivalent to stating that, for every path $\langle f_1,..,f_n\rangle$ of 1-cells, between any 2 parenthesized composition of the path (e.g. between $\left(\big((f_1f_2)f_3\big)..f_n\right)$ and $f_1\left(\big( f_2 f_3\big)f_4... \right) $ ), there is a unique isomorphism built up by the associator and unit isomorphisms. Formally, there are of course, more such isomorphism, but the theorem just states that they all coincide.
Tom Leinster has another approach, so called unbiased monoidal (or bi-) categories, which written in simplified words, considers not only binary products a priori, but an $n$-fold product (without parenthesis) for all paths of length $n$, as basic operation. Then, the coherence axiom becomes requiring that, for any path of 1-cells, if we 'put 2 pairs of (disjoint) parentheses', then it doesn't matter, which one is evaluated first -i.e. these are commuting squares, for example $f_1f_2f_3f_4\to (f_1f_2)f_3f_4 \to (f_1f_2)(f_3f_4)$ and $f_1f_2f_3f_4\to f_1f_2(f_3f_4) \to (f_1f_2)(f_3f_4)$ must coincide. (Note that the middle terms are 3-fold compositions.)
Note also that allowing paths of length $0$ (as being the vertices) just defines the unit 1-cell(s) and provides the unit coherence isomorphisms..
From this, the coherence theorem is very easily seen, by simple induction. (However, to prove that we get the 'same kind' of monoidal (or bi-) category concept, we need to use the original coherence theorem..)