How can $y$ and $y'$ be independent in variational calculus?
We are given a function $f$ of three variables: $$(u,v,w)\mapsto f(u,v,w)\ .$$ When functions $$x\mapsto u:=\phi(x), \quad x\mapsto v:=\psi(x),\quad x\mapsto w:=\chi(x)$$ are supplied then a pullback $$\Phi(x)=f\bigl(\phi(x),\psi(x),\chi(x)\bigr)$$ results that can be integrated over $x$ from $a$ to $b$: $$F:=\int_a^b\Phi(x)\>dx\ .$$ The value of the quantity $F$ depends on the functions $\phi$, $\psi$, $\chi$ in $(1)$; therefore $F$ is called a functional, and one is intended to write $F(\phi,\psi,\chi)$ instead of just $F$.
Now in the case of variational calculus the three functions $\phi$, $\psi$, $\chi$ are $x\mapsto x$, $x\mapsto y(x)$, $x\mapsto y'(x)$ for a single function $y:\>[a,b]\to{\mathbb R}$. Therefore the functional $F$ in question only depends on this $y(\cdot)$. It is therefore allowed to write $$F(y):=\int_a^b f\bigl(x,y(x),y'(x)\bigr)\>dx\ .\tag{2}$$ When arguing about this functional $F$ we look at increments $F(y+\epsilon u)-F(y)$ where $\epsilon u$ is a small variation of $y$. We then have to differentiate $$F(y+\epsilon u):=\int_a^b f\bigl(x,y(x)+\epsilon u(x),y'(x)+\epsilon u'(x)\bigr)\>dx$$ with respect to $\epsilon$, and by the chain rule this involves computing partial derivatives of $f$ with respect to the second and third variable. It is pure lazyness that these partial derivatives are denoted by ${\partial f\over\partial y}$ and ${\partial f\over \partial y'}$ instead of $f_{.2}$ and $f_{.3}$.
As far as I understand, and I may be wrong, I think your confusion is just about notational technicality.
While $y'$ is a function of $y$, from the point of view of the function $f$, it only takes in a vector of real numbers and maps it to a real number. Therefore, having $x$ and $y$ as an input isn't enough. Some problems require $y'$ in the integral, e.g. let's take a simple problem, say we want to the shortest function that connects two points $a$ and $b$. Obviously the answer is a straight line, but let's do this as a variational calculus problem
We have $$ds=\sqrt{dx^2+dy^2}=dx\sqrt{1+y'^2}$$ Therefore: $$F=\int ds=\int_0^a(1+y'^2)^{1/2}dx$$
As you can see, $y'$ appears in the integral. "But wait!" I hear you exclaim, "$y'$ is just a function of $y$, so we can just say $f(y,x)=(1+y'^2)^{1/2}$ and everything is fine!"
The real point here is that $f$ does not take the whole function $y(x)$ as an input, if it did, $f$ would "know" what $y'$ was and so you could write $f(x,y)$ instead as you said. But this is not the case. $f$ takes a vector in $\mathbb{R}^3$ and gives out a scalar in $\mathbb{R}$. So for rigorous notation, you must say $f(y,y',x)$ and not $f(y,x)$.
The fact that $f$ doesn't "know" the whole function $y(x)$ and can only take in a vector in $\mathbb{R}^3$ might sound problematic, but it isn't. The standard technique for dealing with a problem in this form is to take the directional derivative of $F$ in an arbitrary direction $z$, let's say $DF[z]$, then the solution is a $y(x)$ that makes $DF[z]=0$ for all possible $z$.
$$DF[z]=\lim_{\epsilon\rightarrow 0}\left[\frac{d}{d\epsilon}\int_0^a f(y+\epsilon z, y'+\epsilon z', x) dx\right]\tag{1}$$
You can think of dealing with a calculus of variations problem in the following inelegant heuristic way:
- Go through all possible functions $y(x)$
- See which ones lead to $DF[z]=0$ for any and every $z$. These are the solutions to the problem
- (Optional) Apply boundary conditions to get a unique solution
Therefore, $f$ doesn't need to "know" the whole function $y$, we explicitly give it that information when we do the work in computing $f(y+\epsilon z, y'+\epsilon z', x)$. If we did as you suggested and used $f(y,x)$, then for any particular input to $f$, we only have the value $y$. Can you work out the value $y'(x)$ if you only know the value of $y(x)$ at $x$ only and nowhere else? No, of course not.
In the example above, at every point of $x$ between $0$ and $a$, $f$ takes $y'$ at that point (unfortunately this is a degenerate example and only depends on $y'$ and not $y$ or $x$) and pops out a scalar, integrating this gives the length of $y(x)$. This integral is the functional $F$ and all you need to do is find $DF[z]$ and apply the three steps above.
To your other question, why $\frac{\partial y'}{\partial y}=0$ and vice-versa is kind of like a notational convention in variational calculus. Of course, $y'$ isn't independent of $y$ but in the context of this function $f:\mathbb{R}^3\rightarrow\mathbb{R}$ which we are taking partial derivatives of when working out $d/d\epsilon$, we want to find out how a small change in the function $y$ of $\epsilon z$ makes a difference to $F$, and obviously this will also make a small change of $\epsilon z'$ in $y'$. Now look at $(1)$, specifically $f(y+\epsilon z, y'+\epsilon z', x)$. We deal with this via a multivariable Taylor series as $\epsilon$ is small. Now when we apply the Taylor series to $f$, $y$ and $y'$ are just separate inputs to the function $f$, they are just real numbers. Not functions. So in this context, $y$ and $y'$ are independent.
The point is, and I really want to emphasise this, we just want to know how changes to the first and second inputs ($y$ and $y'$) of $f$ cause a change in $f$. So we don't care that $y$ and $y'$ are dependent. We see how $f$ is being perturbed by the perturbation in $y$ through the 1st and 2nd inputs and we can work out from this that the stationary point must occur when this perturbation of $f$ is such that the gradient of the resulting change in $F$ is zero, i.e. $DF[z]=0$
I hope this is clear, I think I fully understand and empathise with your concern about the notation but I can also understand why $f(y,x)$ is definitely not the correct notation for some problems, I hope I have made it clear what I mean. If not just ask for clarification. And hopefully everything I've said is accurate, I'm pretty sure it is, but if not let me know.
First of all, $F$ is a functional while $f$ is not. $f$ depends only on finite dimensional vectors. So when you write
$$F[y]=\int_0^1 f(x,y(x),y'(x)) dx$$
you are defining $F$ to be a functional of a rather particular form. For instance it is impossible to choose $f$ such that $F[y]=y(0)$.
Now you are correct that you cannot alter $y'$ without also somehow altering $y$, and can't do very much to $y$ without altering $y'$. However, it is possible to change $y'$ in a "large" fashion while only altering $y$ in a "small" fashion, and functionals of the form above can "detect" when you have done this. (Precisely speaking, the map $y \mapsto y'$ is not bounded.)
For instance, consider the sequence $y_n(x)=\frac{1}{n \pi} \sin(n \pi x)$, $y_0(x)=0$. $y_n$ converges uniformly to $y_0$. Yet $y'_0=0$ while $y'_n=\cos(n \pi x)$: $y'_0$ and $y'_n$ are quite far apart. So the fact that $f$ can depend explicitly on $y'$ means that we can have functionals like
$$F[y]=\int_0^1 y'(x)^2 dx$$
which can see the difference between $y_0$ and $y_n$ for $n>0$ (you will find that $F[y_0]=0$ while $F[y_n]=\frac{1}{2}$ for $n>0$).
More rigorously, if $F$ has the form we started with and $y_n \to y$ uniformly, then we may not have that $F[y_n] \to F[y]$. This would have to happen if $f$ were a continuous function depending only on $x$ and $y(x)$.