Consider a positive integer $n$ and a set of positive real numbers $\mathbf p=(p_x)$ such that $\sum\limits_xp_x=1$. The multinomial distribution with parameters $n$ and $\mathbf p$ is the distribution $f_\mathbf p$ on the set of nonnegative integers $\mathbf n=(n_x)$ such that $\sum\limits_xn_x=n$ defined by $$ f_\mathbf p(\mathbf n)=n!\cdot\prod_x\frac{p_x^{n_x}}{n_x!}. $$ For some fixed observation $\mathbf n$, the likelihood is $L(\mathbf p)=f_\mathbf p(\mathbf n)$ with the constraint $C(\mathbf p)=1$, where $C(\mathbf p)=\sum\limits_xp_x$. To maximize $L$, one asks that the gradient of $L$ and the gradient of $C$ are colinear, that is, that there exists $\lambda$ such that, for every $x$, $$ \frac{\partial}{\partial p_x}L(\mathbf p)=\lambda\frac{\partial}{\partial p_x}C(\mathbf p). $$ In the present case, this reads $$ \frac{n_x}{p_x}L(\mathbf p)=\lambda, $$ that is, $p_x$ should be proportional to $n_x$. Since $\sum\limits_xp_x=1$, one gets finally $\hat p_x=\dfrac{n_x}n$ for every $x$.


If an observation is

$$\begin{align} p_1 = P(X_1) &= \frac{x_1}{n} \\ &...\\ p_m = P(X_m) &= \frac{x_m}{n} \end{align}$$

then the likelihood which can be described as joint probability is (https://en.wikipedia.org/wiki/Multinomial_theorem)

$$\begin{align} L(\mathbf{p}) &= {{n}\choose{x_1, ..., x_m}}\prod_{i=1}^m p_i^{x_i} \\ &= n! \prod_{i=1}^m \frac{p_i^{x_i}}{x_i!} \end{align}$$

and the log-likelihood is

$$\begin{align} l(\mathbf{p}) = \log L(\mathbf{p}) &= \log \bigg( n! \prod_{i=1}^m \frac{p_i^{x_i}}{x_i!} \bigg)\\ &= \log n! + \log \prod_{i=1}^m \frac{p_i^{x_i}}{x_i!} \\ &= \log n! + \sum_{i=1}^m \log \frac{p_i^{x_i}}{x_i!} \\ &= \log n! + \sum_{i=1}^m x_i \log p_i - \sum_{i=1}^m \log x_i! \end{align}$$

Posing a constraint ($\sum_{i=1}^m p_i = 1$) with Lagrange multiplier

$$\begin{align} l'(\mathbf{p},\lambda) &= l(\mathbf{p}) + \lambda\bigg(1 - \sum_{i=1}^m p_i\bigg) \end{align}$$

To find $\arg\max_\mathbf{p} L(\mathbf{p},\lambda) $

$$\begin{align} \frac{\partial}{\partial p_i} l'(\mathbf{p},\lambda) = \frac{\partial}{\partial p_i} l(\mathbf{p}) + \frac{\partial}{\partial p_i} \lambda\bigg(1 - \sum_{i=1}^m p_i\bigg) &= 0\\ \frac{\partial}{\partial p_i} \sum_{i=1}^m x_i \log p_i - \lambda \frac{\partial}{\partial p_i} \sum_{i=1}^m p_i &= 0 \\ \frac{x_i}{p_i}- \lambda &= 0 \\ p_i &= \frac{x_i}{\lambda} \\ \end{align}$$

Thus, $$\begin{align} p_i &= \frac{x_i}{n} \end{align}$$

because

$$\begin{align} p_i &= \frac{x_i}{\lambda} \\ \sum_{i=1}^m p_i &= \sum_{i=1}^m \frac{x_i}{\lambda} \\ 1 &= \frac{1}{\lambda} \sum_{i=1}^m x_i \\ \lambda &= n \end{align}$$

Finally, the probability distribution that maximizes the likelihood of observing the data

$$\begin{align} \mathbf{p} = \bigg( \frac{x_1}{n}, ..., \frac{x_m}{n} \bigg) \end{align}$$