Intuition behind variational principle

Hofer-Zehnder, in section 1.5, proves that every Hamiltonian field on a strictly convex compact regular energy surface carries a periodic orbit. I have understood the proof. What I am wondering about is how he gets the variational principles. The start of the proof finds a function $H$ having the given convex hypersurface as energy surface and satisfying positive homogeneity of degree two, positive definiteness of the Hessian, and $\mathcal{C}^2$ regularity. Then the proof normalizes the period to get period $2\pi$, and then it goes:

For example, the variational principle:

$$\min\int_0^{2\pi}H\big(x(t)\big)\,dt\quad{\rm under}\quad\frac{1}{2}\int_0^{2\pi}\langle Jx,\dot x\rangle=1.$$

[…] has the above equations as Euler equations.

Firstly, what are the Euler equations of a variational principle? And secondly, how did they get this particular principle?

Update

Let me make my question more precise. There are 3-4 points I have trouble with.

Why do they consider this principle with this strange constraint of that integral being 2? I mean, it isn't even a vector space! It's perhaps not even an affine space, it has no particular properties. Why should this constraint be useful? And why that functional?
How do I get the Euler equations for this principle, and what are they? If I try to say "directional derivatives 0", I get stuck at once, because how am I supposed to make the variation?
The book then goes on, and says this functional attains neither the infimum, nor the supremum. How can one prove these claims?
The book then Legendre transforms $H$ into $G$, gives properties of $G$, and introduces the same principle on $G$:

$$\min\int_0^{2\pi}G(\dot z)\,dt\quad{\rm under}\quad\frac{1}{2}\int_0^{2\pi}\langle Jz,\dot z\rangle=1.$$

Again, why should I consider this particular principle, with this particular constraint? I mean, I understood the rest of the proof so I know this is the right principle to consider, but how was it come up with? Is there any reason that would make one think of this principle as a possible correct choice?

Let's not consider a closed path for now but some smooth curve $\gamma:[0,1]\to M$ , and consider the functional $$S[\gamma]:=\int_0^1 \sum_ip_i(\gamma(t)) \mathrm d q_i(\gamma(t))-H(\gamma(t)) \mathrm d t$$ where I'm working in a chart with coordinates $(q_1,\dots,q_n,p_1,\dots,p_n)$. An old problem in classical mechanics (finding trajectories with fixed endpoints) was formulated such that this function is stationary wrt the true physical trajectory. You can read more about how calculus of variations works, so here I will get the first variation of this functional a bit heuristically. To get the variation, we compute $S[\gamma + \epsilon \eta]-S[\gamma]=\epsilon \delta S[\gamma]+O(\epsilon^2)$, where $\eta:[0,1]\to M$ is a variation with $\eta(0)=\eta(1)=0$. In practice, we can get the variation by operating with $\delta$ as if it were an ordinary differential, so for instance $\delta(a b)=a\delta b + b\delta a$, $\delta(f(x,y))=f_x \delta x + f_y \delta y$ etc. The Hamiltonian explicitly depends on $H(p_1(\gamma(t)),q_1(\gamma(t)),\dots,p_n(\gamma(t)),q_n(\gamma(t)))$, so in varying this we get

$$\delta S = \int_0^1 \sum_i\left(\delta p_i \mathrm d q_i + p_i\mathrm d \delta q_i\right)-\sum_i\left(\frac{\partial H}{\partial p_i}\delta p_i + \frac{\partial H}{\partial q_i}\delta q_i\right)\mathrm d t$$ and now we note that $p_i\mathrm d \delta q_i=p_i\delta q_i-\delta q_i\mathrm d p_i$. This means that we may integrate by parts: $$\delta S = \sum_i\int_0^1 \left(\mathrm d q_i -\frac{\partial H}{\partial p_i}\mathrm d t \right) \delta p_i -\left(\mathrm d p_i +\frac{\partial H}{\partial p_i} \mathrm d t \right) \delta q_i + \sum_i p_i\delta q_i|_{0}^1.$$ The last term is usually called a surface term, and it vanishes because the variation vanishes at the endpoints, since the endpoints are fixed.

Requiring $\delta S = 0$ gives $$\frac{\mathrm d q_i}{\mathrm d t}=\frac{\partial H}{\partial p_i},\,\frac{\mathrm d p_i}{\mathrm d t}=-\frac{\partial H}{\partial q_i}$$ along $\gamma$. This is the same as requiring that $\gamma$ be the flow of the Hamiltonian vector field $X_H$ given by $dH=i_{X_H}\omega$ (modulo a minus sign I always forget). Historically, this was the motivation for symplectic geometry, with the symplectic manifold as the cotangent bundle $T^*Q$ of the $n$-dimensional configuration space $Q$, which is nothing more than the manifold where (generalized) coordinates live. Alternatively we may write $$\dot x = J\nabla H$$

These are properly called Hamilton's equations, although they may be viewed as Euler-Lagrange equations, which are given by the same procedure but with a functional $\int L(\dot q,q;t)\mathrm dt$. The book is probably referring to Hamilton's equations as Euler equations, because that's what they are but for this particular functional.

Now, for your particular problem, you have periodic motion. I remember that you asked in this question about why over a closed path we get $$\int_{\gamma} \lambda:=\int_{\gamma} \sum_i p_i\mathrm d q_i = \frac12\int_0^{2\pi}\langle-J\dot x,x\rangle dt$$ The constraint is a normalization of this, basically. And since $\langle J x,Jy\rangle=\langle x,y\rangle$, this really is equivalent to your constraint. To obtain the correct equations of motion (as they are called) from variational calculus, when constraints are present, we use Lagrange multipliers. Consider the problem of making $\int F(\dot x,x,t) \mathrm dt $ stationary, where $F$ is some functional with the constraint $\int M(\dot x,x,t) \mathrm dt= m$.This can mean for instance that the total mass is m, whatever. The correct functional to which we may apply Euler-Lagrange equations is

$$\int\left(F(\dot x,x,t) + \lambda(x)M(\dot x,x,t)\right)\mathrm dt$$

So, by saying minimize $\int H(x(t))\mathrm dt$ along with the constraint, I'm really saying to minimize the so called "action functional", which is the one I started this answer with ($\lambda$ will drop out during calculation, and turn out to be a constant in this case). All of this should answer your questions 1. and 2.

Whether $S[\gamma]$ attains an infimum or supremum isn't important, only whether it's stationary is. I don't know how I would show what this attains a minimum or maximum. That would require the second variation, or something like the non-negativity of the Weierstrass excess (or simply E) function $E(x,y,z,w)=F(x,y,w)-F(x,y,z)-(w-z)F_{y'}(x,y,z)$.

Let there be given a curve $\vec{p}(t)$ and a real valued function $L$ with the following arguments: this curve, the time derivative of the curve and the time $t$ itself. Now consider the following integral: $$ W\left(\vec{p},\dot{\vec{p}}\right) = \int_{t_1}^{t_2} L\left(\vec{p},\dot{\vec{p}},t\right) dt $$ The integral $W$ is dependent upon the chosen curve $\vec{p}(t)$ ; it is a so-called functional of $\vec{p}$ and $\dot{\vec{p}}$.
We wish to determine for which curves $\vec{p}(t)$ the functional $W(\vec{p},\dot{\vec{p}})$ has an extreme value. But it is not possible to simply take a derivative and put it to zero. So we must do something else.
Assume that the extreme value is obtained for the curve $\vec{q}(t)$ and let $\vec{p}(t) = \vec{q}(t) + \epsilon \vec{f}(t)$ , with the same end-points $\vec{p}(t_1) = \vec{q}(t_1)$ and $\vec{p}(t_2) = \vec{q}(t_2)$, hence $\vec{f}(t_1) = \vec{f}(t_2) = \vec{0}$. For the rest, the curve $\vec{f}(t)$ is arbitrary and the variable $\epsilon$ does not depend on $t$. In this way our functional $W$ has become a common function of $\epsilon$: $$ W\left(\vec{p},\dot{\vec{p}}\right) = \int_{t_1}^{t_2} L\left(\vec{q} + \epsilon \vec{f},\dot{\vec{q}} + \epsilon \dot{\vec{f}},t\right) dt $$ A common real-valued function, which can be differentiated in the common way to find extremes: $$ \frac{dW}{d\epsilon} = 0 \qquad \Longleftrightarrow \qquad \int_{t_1}^{t_2} \frac{dL\left(\vec{q} + \epsilon \vec{f},\dot{\vec{q}} + \epsilon \dot{\vec{f}},t\right)} {d\epsilon} dt \quad \mbox{for} \; \epsilon = 0 $$ Let the (vector) components of $\,\vec{p}\,$ be $\,p_k$ . Apply the chain rule: $$ \frac{dL}{d\epsilon} = \sum_k \frac{\partial L}{\partial p_k} \frac{d p_k}{d\epsilon} + \sum_k \frac{\partial L}{\partial \dot{p}_k} \frac{d \dot{p}_k}{d\epsilon} + \frac{\partial L}{\partial t} \frac{d t}{d\epsilon} = \sum_k \frac{\partial L}{\partial p_k} f_k + \sum_k \frac{\partial L}{\partial \dot{p}_k} \dot{f}_k + 0 $$ The last term can be rewritten with help of: $$ \frac{d}{dt} \left\{ \frac{\partial L}{\partial \dot{p}_k} f_k\right\} = \frac{\partial L}{\partial \dot{p}_k} \dot{f}_k + \frac{d}{dt} \left(\frac{\partial L}{\partial \dot{p}_k}\right) f_k $$ It follows that: $$ \frac{dW}{d\epsilon} = \sum_k \int_{t_1}^{t_2} \left[\frac{\partial L}{\partial p_k} f_k - \frac{d}{dt} \left(\frac{\partial L}{\partial \dot{p}_k}\right) f_k + \frac{d}{dt} \left\{ \frac{\partial L}{\partial \dot{p}_k} f_k\right\} \right] dt = \\ \sum_k \int_{t_1}^{t_2} \left[\frac{\partial L}{\partial p_k} - \frac{d}{dt} \left(\frac{\partial L}{\partial \dot{p}_k}\right) \right] f_k\, dt + \sum_k \left[ \frac{\partial L}{\partial \dot{p}_k} f_k \right]_{t_1}^{t_2} $$ But we know that $f_k(t_1) = f_k(t_2) = 0$ , so the last term is zero. Furthermore we know that, for $\epsilon = 0$ : $p_k = q_k$ , and the functions $f_k$ are completely arbitrary. Therefore it is concluded that the integral is zero if the folowing necessary conditions (not sufficient eventually) are fulfilled for all $k$ : $$ \frac{\partial L}{\partial q_k} - \frac{d}{dt} \left(\frac{\partial L}{\partial \dot{q}_k}\right) = 0 $$ These are the well known Euler-Lagrange equations. It is noted that special notations like $\delta W$ for the "variation" of the integral are not needed. The variation is reduced to common differentiation, which makes the (IMHO confusing) $\delta$ practice completely redundant.

Simplify $\prod_{k=1}^5\tan\frac{k\pi}{11}$ and $\sum_{k=1}^5\tan^2\frac{k\pi}{11}$

Proving Caratheodory measurability if and only if the measure of a set summed with the measure of its complement is the measure of the whole space.

Find the modular inverse of $19\pmod{141}$

what is the relationship between ZFC and first-order logic?

Finding $\lim \limits_{x \to 0} \frac{1 - \cos x}{x}$, given $\lim \limits_{x \to 0} \frac{\sin x}{x} = 1$

Proof that every number has at least one prime factor

How to prove that $\frac{a+b}{2} \geq \sqrt{ab}$ for $a,b>0$? [duplicate]

If $|A|=30$ and $|B|=20$, find the number of surjective functions $f:A \to B$.

What is the weak*-topology on a set of probability measures?

what is the surface area of a cap on a hypersphere?

sufficiency and necessity of convergence of $\sum a_n$ wrt convergence of $\prod (1 + a_n)$

Expected number of (0,1) distributed continuous random variables required to sum upto 1