Why is the Power Set Operation Inherently Vague?
Solution 1:
I think one way of getting a grip on the vagueness is to explore multiple different power set operations and understand where they fall short and where they behave much like we would expect power set to behave. One straightforward one is to use $\mathbb{N}$ as the domain of discourse and look at the set of all finite subsets of a set $S$ (which I'll call $\mathcal{P}_f(S)$). This clearly doesn't satisfy all of the ZF axioms, but it does a remarkably good imitation of a power set (and, e.g., the set of all finite and cofinite subsets does an even better one). Once you feel like you have a handle on that and how it 'fits into' the rest of the axioms, you can consider the set of all constructible subsets of $S$ (for your favorite definition of constructible) and try to figure out where the problems slot in. In short, a lot of the vagueness of power set comes down to the notion of what constitutes a set in the first place, and particularly of how we can 'build' sets (and thus has very core connections to the axioms of specification/replacement/comprehension and to Russell's paradox).
Solution 2:
Thats really a long comment that wouldn't fit the box:
I think of the real reason as (3):
Although our concept of set is definitely transitive, that is, if something is a set then every collection of elements of that thing must be a set, that is not true always, because, as you said, we only have separation axiom for first order formulas. If we had a general separation axiom, for example, we wouldn't need axiom of choice:
If $\mathcal{F}$ is family and we could extract subsets anyway we want, we could collect a subset of $\mathcal{P}(\mathcal{F})$ containing one element of each element of $\mathcal{F}$.
One instance of (1) happening is the problem of defining a choice function in ZF: The power set and separations are not powerful enough to determine if we can form a choice function (that is, specify a certain subset of the power set of the family)