How to explain this quirk of the chain rule?
Let's use a different notation: for a function of two variables $f$, denote by $\partial_1f$ and $\partial_2f$ the first-order derivatives of $f$ with respect to the first and second variable respectively, namely: \begin{align*} \partial_1f(x,y)&=\lim_{h\to0}\frac{f(x+h,y)-f(x,y)}h,& \partial_2f(x,y)&=\lim_{h\to0}\frac{f(x,y)-f(x,y+h)}h. \end{align*} Now, from the chain rule, $$\frac{\mathrm{d}}{\mathrm{d}y}\Bigl(f\bigl(y,\phi(y,x)\bigr)\Bigr) =\partial_1f\bigl(y,\phi(y,x)\bigr)+\partial_1\phi(y,x)\partial_2f\bigl(x,\phi(y,x)\bigr),$$ where $$\frac{\mathrm{d}}{\mathrm{d}y}\Bigl(f\bigl(y,\phi(y,x)\bigr)\Bigr)=\lim_{h\to0}\frac{f\bigl(y+h,\phi(y+h,x)\bigr)-f\bigl(y,\phi(y,x)\bigr)}{h}.$$
In fact, I try to always be careful to what I'm writing and what I really mean. First, I'm careful to never say the function f(x), but the function $f$ (unless $f$ is a function with codomain a set of functions). At best, $f(x)$ is an expression that depends on $x$.
Then I use symbols like $\partial_1$, $\partial_2$, etc. for functions, and things like $\dfrac{\mathrm{d}}{\mathrm{d}x}$ or $\dfrac{\partial}{\partial x}$ for expressions (though, in fact, it's slightly more complicated).
(Hence, I hate when people say something like [something] is a function of $x$. Heck, what does it mean to be a function of $x$? you're a function or you're not, you can't be a function of $x$; at best, you're an expression that depends on $x$).
Then there's something I like to do: take a function $f$ of two variables and define the function $g$ by $$g(y,x)=f(x,y).$$
Then I like to ask this question: with your notation, what sense do you give to $$\frac{\partial g}{\partial x}?$$ or to any other variation on the theme: $$\frac{\partial g(x,y)}{\partial x},\ \frac{\partial g}{\partial x}(x,y),\ \ldots$$
In my opinion, Leibniz notation for partial derivatives is terrible: I avoid using it whenever possible, except for a particular usage from differential geometry. (the ambiguity you cite in your question isn't the only problem with it!)
My favorite notation is a variation of $f'$ used for the derivative of a univariate function $f$: the functions $f_1$ and $f_2$ are the functions one would normally write as
$$ f_1(x,y) = \frac{\partial}{\partial x} f(x,y) $$ $$ f_2(x,y) = \frac{\partial}{\partial y} f(x,y) $$
so I would write
$$ \frac{\partial}{\partial y} f(y, \phi(y,x)) = f_1(y, \phi(y,x)) + f_2(y, \phi(y,x)) \phi_1(y,x) $$
Typically, I'm interested in both partials rather than just one partial, and I would use differentials instead of partial derivatives to organize the calculation of all of them at once
$$ \mathrm{d} f(y, \phi(y,x)) = f_1(y, \phi(y,x))\, \mathrm{d}y + f_2(y, \phi(y,x)) \,\mathrm{d}\phi(y,x) = \ldots $$
and when I'm really interested in one partial, I do the same thing, except work in the setting where I've set $\mathrm{d}x=0$. (assuming the partial I really mean to use is the one where $x$ is held constant)
In the differential geometry setting, in my opinion there is no ambiguity:
$$ \frac{\partial}{\partial x^i} f(x^i, g(y^i, x^i)) $$
has only one reasonable meaning: applying the tangent vector $\partial/\partial x^i$ to the scalar field $f(x^i, g(y^i, x^i))$ in the $i$-th coordinate direction. In my opinion, you wouldn't use that notation when you wanted the derivative of $f$ with respect to its first argument.
(although this example has two sets of independent variables: which makes me again dislike using Leibniz notation for it)