Explanation as to why we treat position and velocity as independent variables in the lagrangian?

The simplest reason for why we can do that is because

Given a function $f(x)$, if we can write it as $f(x,y)$ where $y = y(x)$, we can apply the identity $$ df = \frac {\partial{f}} {\partial{x}} dx + \frac {\partial{f}} {\partial{y}} dy$$

The derivation of this identity never makes the assumption that $x$ and $y$ have to be independent. The $only$ problem that can arise is that we might give $y$ values which are not allowed.

For example, if $y = x^2$, we find that $\frac {df} {dx}$ is the same as that calculated using the identity, but f(2,3) is not valid, as $y = 2^2 = 4$.

But when we talk about the Lagrangian, since at the end we use the identity that $\frac {dx} {dt} = v$, hence we are assured that we will never stumble upon any such problem.


As this would be too long as a comment let me try to answer.

"As such, before invoking any variational principles, we are able to treat position, $q(t)$ and velocity, $\dot q(t)$ as independent variables."

No, I don't think so because by $\dot q(t)=$ is the time derivative of $q$, which is the only degree of freedom here. Mathematically you can't change a function and its derivative independently. The notation $L(q,\dot q)$ is understood as a function of two independent arguments evaluated at not independent points. $$\dot{(\delta q)} = \delta \dot q$$ follows.

Of course you are allowed to consider the mathematical problem with $q\rightarrow q_1$, $\dot q\rightarrow q_2$, but the physics behind it changes (2 degrees of freedom instead of 1).

Edit:

The action functional is $S=S[q]$, no need to specify what it does to the function $q$ (i.e. derivatives, ...), but it is a functional, not a function. The Lagrangian is a function, i.e. maps a "number to a number", not a function to a number. It is convenient to specify derivatives because you can then use standard analysis to make the expansion. You could write $S=\int L[q(t)] dt$ but then $L$ is a differential operator which makes the notation un-explicit, namely $\delta L$ is strongly dependent on $L$, whereas the equation of motion for $L=L(q,\dot q)$ are universal, regardless of $L$.

Of course one could think of many other functional, for example not local ones of the form

$$ S=\int L[q(t),q(t')]dt dt' $$

but the physics described by such model is very different, notably the notion of causality: what happens at $t<t'$ is sensible by what will be at $t'$ !