Joint density of order statistics
Solution 1:
By hypothesis, all the $Y_i$ are identically distributed with density function $f$. This implies that
$$ \forall 1 \leq i,j \leq n: P(Y_i = Y_j) = 0 $$
This implies for the order statistics that
$$ Y_{(1)} < \ldots < Y_{(n)}$$
almost surely. Consequently, $(Y_{(1)}, \ldots, Y_{(n)}) \in \mathbb{R}^n_{\Delta}$ a.s., where $$ \mathbf{R}^n_{\Delta} := \{(y_1, \ldots, y_n) \in \mathbb{R}^n \mid y_1 < \ldots < y_n\}. $$
Consequently, it suffices to calculate the joint density on a generator of the Borel algebra of $\mathbb{R}^n_{\Delta}$ with is given by sets of the type
$$ ]t_0, t_1] \times \ldots \times ]t_{n-1} \times t_n], \; t_0 < \ldots < t_n. $$
We denote by $\mathfrak{S}_n$ the group of all permutations of $\{1, \ldots, n\}$ and calculate
\begin{align*} P(Y_{(1)} \in ]t_0, t_1], \ldots, Y_{(n)} \in ]t_{n-1}, t_n]) &= P(\dot \bigcup_{\pi \in \mathfrak{S}_n}{ \{Y_{\pi(1)} \in ]t_0, t_1], \ldots, Y_{\pi(n)} \in ]t_{n-1}, t_n] \} }) \\ &= \sum_{\pi \in \mathfrak{S}_n}{ P(\{Y_{\pi(1)} \in ]t_0, t_1], \ldots, Y_{\pi(n)} \in ]t_{n-1}, t_n] \}) } \\ &= \sum_{\pi \in \mathfrak{S}_n}{ P(Y_{\pi(1)} \in ]t_0, t_1]) \ldots P(Y_{\pi(n)} \in ]t_{n-1}, t_n]) } \\ &= \sum_{\pi \in \mathfrak{S}_n}{\prod_{i=1}^{n}\int_{]t_{i-1},t_i]}{f(x_i)dx_i}} \\ &= n! \prod_{i=1}^{n}\int_{]t_{i-1},t_i]}{f(x_i)dx_i} \\ &= \int_{]t_0, t_1] \times \ldots ]t_{n-1},t_n]}{n! \prod_{i=1}^{n}{f(x_i)}d(x_1, \ldots, x_n)} \end{align*} For the second equality it is crucial that these sets are disjoint for the various $\pi \in \mathfrak{S_n}$ and for the third that the $Y_i$ are independent. Alltogether this implies that the joint densiy of the order statistics is given by
$$F(y) := \begin{cases} n! \prod_{i=1}^{n}{f(y_i)}, & y_1 < \ldots < y_n, \\ 0, & \text{ otherwise} . \end{cases}$$
Solution 2:
It may help to see a simple example of small size. Suppose $n = 2$. Then our order statistics consist of the smaller of the two observations, and the larger. So if, say, we observed $Y_1 = 3, Y_2 = 5$, then $Y_{(1)} = Y_1 = 3$ and $Y_{(2)} = Y_2 = 5$; nothing has changed. But if the first observation was $5$ and the second was $3$, we would have to interchange them to write $Y_{(1)} = 3, Y_{(2)} = 5$ as we did before. Notice then that for a given realization of the ordered set of random variables, there is in general two ways to have observed the original variables so as to obtain the ordered result (we can ignore the case where the observations are equal because for continuous random variables, such an outcome has infinitesimal probability).
Therefore, in general, for any given realization of order statistics of a sample of size $n$, there are $n!$ possible ways that we could have observed a sample that, upon ordering, would yield such a realization. That's basically why the joint density is scaled the way it is.