Energy in the heat equation.
Nearly a year ago, I gave in the comments a non-answer that would justify calling the quantity $$ \int u(x,t)^2~dx = E(t) $$ the energy of the rod. Namely, one can think of minimizing the energy functional as satisfying a conservation law, and thus minimizing the energy leads you to solutions of the associated partial differential equations.
One year out, I think I have a more concrete, dimensional-analysis based answer. Let us examine the equation $$ \partial_t u = -\alpha\partial_{xx}u $$ for its dimensions. $u = u(x,t)$ represents temperature at the spacetime coordinate $(x,t)$, so it has units of temperature $T$. $\alpha$ here is the thermal diffusivity, which has units of length squared over time, $L^2/\tau$. Finally, not in the equation but lurking in the background is the kinetic energy, which is related to temperature by the equation $$ E_k = \frac{3}{2}kT, $$ where $k$ is the Boltzmann constant and has units of energy over temperature, $E/T$.
Let me use $[unit]$ to represent the dimensions of any units in any expression. So for example, the units in the heat equation check out: $\partial_t u$ has units of temperature over time, while $\partial_xx u$ has units of temperature over length squared, which gives us $$ [T\tau^{-1}] = [L^2\tau^{-1}][TL^{-2}]. $$ Now, in our scenario the total length of the rod is fixed, so without loss of generality we may treat it as a constant. Dimensionally, this is saying we will take $[L] = [1]$, so it drops out of any of our equations. This is all fine, as we see in the heat equation the units of length cancel each other anyway.
So let us turn to the energy functional. In terms of units, $$ \int u(x,t)^2~dx = \int [T^2]~d[L] = [T^2L] = [T^2]. $$ So the energy functional has units $[T^2]$, which isn't where we want to be (we want units of energy). However, the primary thing we do with taking the energy functional is minimize it. Minimizing the energy functional $E(t)$ is equivalent to minimizing the square root of the energy functional, $\sqrt{E(t)}$, which has units of $[T]$.
Still not there. But aha! Energy and temperature, $[E]$ and $[T]$, are related by a proportionality law! That is,the equation $$ E_k = \frac{3}{2}kT $$ tells us precisely how to convert from temperature to energy, and the conversion preserves order. So if we can minimize the square root of the "energy functional" $\sqrt{E(t)}$, which has units of $T$, then we automatically know how to minimize the actual energy, which according to our dimensional analysis must be something like $$ \sqrt{\int \frac{9}{4}k^2 u(x,t)^2~dx } = \frac{3k}{2}\sqrt{E(t)}, $$ where the right-hand side is actually integrating the physical notion of energy squared. And now, the units do indeed check out: these are quantities with units of energy. There are surely other ways to define a natural notion of a physical energy functional using an integral, but the units of this definition work and it has nice mathematical properties (pretty much exactly those of $\sqrt{E(t)}$, which we know has an excellent mathematical theory). This revolves around the observation that since $k$ is a constant, minimizing the "energy" functional defined using units of heat is equivalent to minimizing what you would expect to be the "physical energy" functional.
Morally, the fact that $k$ is a constant (despite having units of $[ET^{-1}]$) gives us a temperature-energy equivalence law, which tells us that for dimensional analysis purposes $[E]$ and $[T]$ are indistinguishable. This is what the physicist in me needs to say to satisfy himself. The mathematician in me chooses units for temperature and energy so that $k = 2/3$, and then I just have the simple equivalence $E_k = T$, and now I happily exchange temperature and energy at will. And presumably this is what mathematicians of the past have done.
Treating $L$ like a constant also needs some justification, but I think if we think of $L$ as very small and argue that computing the energy of large homogeneous bodies consists of computing the energy on small pieces and summing, then we can justify that assumption as well.