Geodesic distance from point to manifold
Intuitively, if the minimizing geodesic does not hit orthogonally, then a geodesics-with-corner is shorter, contradicting minimality. To see this, consider the following picture:
The submanifold $N$ is on the left, $c$ is the geodesic which connects $p_0$ (which is too far away to be in view) to $q_0\in N$. Of course, in real life the geodesics isn't "straight" nor is $N$ (whatever that even means for an abstract Riemannian manifold), but if you zoom in enough, everything looks closer and closer to Euclidean.
The point is that by breaking your geodesic at the dotted line, you make a broken geodesic between $p_0$ and $q_1$ which is shorter than $c$. But a broken geodesic is never minimizing, so the actual distance from $p_0$ to $q_1$ is shorter than this broken geodesic, so is, in particular, shorter than $c$. It follows that $q_1$ is strictly closer to $p_0$ than $q_0$ is, contradicting minimality of $q_0$.
The thing which makes this all rigorous is the first variation formula for energy (top of page 195 in my book).
Edit I think I have another approach that avoids this issue we've been discussing in the comments. As before, $\gamma$ is a minimizing geodesic between $p_0$ and $q_0$, where $q_0\in N$ is a point minimizing the distance between $N$ and $p_0$.
Choose a radius $r$ such that $B(2r,q_0)$ is a totally normal neighborhood of $q_0$. Pick any point $p\in B(r,q_0)$ which also lies on $\gamma$. Let $t$ be the smallest radius for which the distance sphere $S_t$ around $p$ intersects $N$. I now claim two things.
1. $q_0$ is in $S_t\cap N$ and
2. For any $q\in S_t\cap N$, we have $T_q N\subseteq T_q S_t$ and so, in particular, a vector normal to $S_t$ at $q$ is necessarily normal to $N$ at $q$.
Believing these two claims for a moment, by the Gauss lemmma (pg 69-70), we know that given any $q\in S_t\cap N$, the unique minimizing geodesic connecting $p$ to $q$ hits $S_t$ orthogonally, and hence hits $T_q N$ orthogonally. Further, if $\alpha$ is the unique minimizing geodesic from $p$ to $q_0$, then $\alpha$ is a portion of $\gamma$ (by uniqueness), showing $\gamma$ hits $N$ orthogonally.
We now prove claim 1.
If $d(p,q_0)< t$, then $S_{d(p,q_0)}$ hits $N$ with a smaller radius, contradicting minimality. If $d(p,q_0)>t$, then the broken geodesic beginning as $\gamma$ from $p_0$ to $p$, then from $p$ to a point of $N\cap S_t$ has length smaller than $\gamma$, contradicting the choice of $q_0$. So, we have $d(p,q_0) = t$, so $q_0 \in S_t\cap N$.
We now prove claim 2.
Assume it's false and choose $v\in T_{q_1}N$ which is not in $T_{q_1}S_t$. By replacing $v$ by $-v$, we may assume that $v$ points "inside" $S_t$. (More specifically, the ball of radius $t$ around $p$ is a manifold with boundary $S_t$, so we have a notion of inside and outside). Then by the Gauss lemma, the geodesic in $N$ in the direction of $v$ gets closer to $p$, contradicting the fact that $S_t$ is the smallest sphere intersecting $N$.
I think i have a answer too. Please verify if it is correct.
Let $\gamma: [0,a]\rightarrow M$ be the geodesic joining the points $p_0$ and $q_0$. Let $v\in T_{q_0}N$ and $\phi: (-\epsilon,\epsilon)\rightarrow N$ be a differentiable curve such that $\phi(0)=q_0$ and $\phi'(0)=v$ (note that the image of $\phi$ is contained in N).
Consider a variation $f:(-\epsilon,\epsilon)\times [0,a]\rightarrow M$ such that $f(s,0)=p_0$ and $f(s,a)=\phi(s)$. Let $E$ be the energy associated with $f$. By the first variation of energy formula, we have that $$\frac{E'(0)}{2}=\langle v,\gamma'(a)\rangle$$
Now, because $\phi$ is contained in $N$, we can conclude that $E'(0)=0$. This is true because $\gamma$ is the curve that minimizes the distance between $p_0$ and $N$ which implies that the curves $f(s,t)$ for fixed $s$ have more energy/length than $\gamma$.