Derivation of the method of Lagrange multipliers?
I've always used the method of Lagrange multipliers with blind confidence that it will give the correct results when optimizing problems with constraints. But I would like to know if anyone can provide or recommend a derivation of the method at physics undergraduate level that can highlight its limitations, if any.
Lagrange multipliers are used to obtain the maximum of a function $f(\mathbf{x})$ on a surface $\{ \mathbf{x}\in\mathbb{R}^n\mid g(\mathbf{x}) = 0\}$ (I use "surface", but whether it is a 2-dimensional, 1-dimensional, or whatever-dimensional object will depend on the $g$ and the $\mathbb{R}^n$ we are dealing with).
The gradient of $f$, $\nabla f$, points in the direction of greatest increase for $f$. If we want to find the largest value of $f$ along $g$, then we need the direction of greatest increase to be orthogonal to $g$; otherwise, moving along $g$ will "capture" some of that increase and $f$ will not achieve its maximum among $g$ at that point (this is akin to the fact that in one-variable calculus, the derivative should be $0$ at the maximum, otherwise moving a bit will increase in one direction will increase the value of the function).
In order for $\nabla f$ to be perpendicular to the surface, it must be parallel to the gradient of $g$; so $\nabla f$ must be a scalar multiple of $\nabla g$. So this amounts to finding a solution to the system \begin{align*} \nabla f(\mathbf{x}) &= \lambda \nabla g(\mathbf{x})\\ g(\mathbf{x}) &= 0 \end{align*} for both $\mathbf{x}$ and $\lambda$.
Added. Such a point is not guaranteed to be a maximum or a minimum; it could also be a saddle point, or nothing at all, much as in the one-variable case, points where $f'(x)=0$ are not guaranteed to be extremes of the function. Another obvious limitation is that if the surface $g$ is not differentiable (does not have a well-defined gradient) then you cannot even set up the system.
An algebraic way of looking at this is as follows:
From an algebraic view point, we know how to find the extremum of a function of many variables. Say we want to find the extremum of $f(x_1,x_2,\ldots,x_n)$, we set the gradient to zero and look at the definiteness of the Hessian.
We would like to extend this idea, when we want to find the extremum of a function along with some constraints. Say the problem is: $$\begin{align} \text{Minimize }f(x_1,x_2,\ldots,x_n)\\\ \text{subject to: }g_k(x_1,x_2,\ldots,x_n) = 0\\\ \text{where }k \in \{1,2,\ldots,m\}\\\ \end{align} $$
If we find the extremum of $f$ just by setting the gradient of $f$ to zero, these extremum need not satisfy the constraints.
Hence, we would like to include the constraints in the previous idea. One way to it is as follows. Define a new function: $$F(\vec{x},\vec{\lambda}) = f(\vec{x}) - \lambda_1 g_1(\vec{x}) - \lambda_2 g_2(\vec{x}) - \cdots - \lambda_m g_m(\vec{x})$$ where $\vec{x} = \left[ x_1,x_2,\ldots,x_n \right], \vec{\lambda} = \left[\lambda_1,\lambda_2,\ldots,\lambda_m \right]$
Note that when the constraints are enforced, we have $F(\vec{x},\vec{\lambda}) = f(\vec{x})$ since $g_j(x) = 0$ when the constraints are enforced.
Let us find the extremum of $F(\vec{x},\vec{\lambda})$. This is done by setting $\frac{\partial F}{\partial x_i} = 0$ and $\frac{\partial F}{\partial \lambda_j} = 0$ where $i \in \{1,2,\ldots,n\}$ and $j \in \{1,2,\ldots,m\}$
Setting $\frac{\partial F}{\partial x_i} = 0$ gives us $$\vec{\nabla}f = \vec{\nabla}g \cdot \vec{\lambda}$$ where $\vec{\nabla}g = \left[\vec{\nabla} g_1(\vec{x}),\vec{\nabla} g_2(\vec{x}),\ldots,\vec{\nabla} g_m(\vec{x}) \right]$
Setting $\frac{\partial F}{\partial \lambda_j} = 0$ gives us $$g_j(x) = 0$$ where $j \in \{1,2,\ldots,m\}$
Hence, we find that when we find the extremum of $F$, the constraints are automatically enforced. This means that the extremum of $F$ corresponds to extremum of $f$ with the constraints enforced.
To decide, if the extremum is a minimum (or) maximum (or) if the point we obtain by solving the system is a saddle point, we need to look at the definiteness of the Hessian of $F$ and decide.