Definition of an Ordered Pair
Solution 1:
You may be interested in reading Kuratowski's "Set Theory".
Here's what I remember from it:
- First one defines pairs $\langle a,b\rangle = \{\{a\},\{a,b\}\}$, with this definition one can define $A\times B$ as the set of all pairs $(a,b)$ with $a\in A$ and $b\in B$. However, that's not a good way to proceed, because of the problems you note.
- With this definition one defines $\prod_{i\in I} X_i$ as the set of functions $f\colon I \to X_i$ such that $f(i)\in X_i$, here $\{X_i \mid i\in I\}$ is a collection of sets (in other words a function $I\to \mathcal P(\cup X_i)$).
- In particular $A^2$ is the set of functions $2\to A$, where $2 = \{\varnothing, \{\varnothing\}\}$, and $A\times B$ is the set of functions $f\colon 2 \to A\cup B$ where $f(\varnothing)\in A$ and $f(\{\varnothing\})\in B$.
- Now we forget about that first definition, and proceed with the latter. (Even though we use the former definition to state the latter!) The practical advantage is that now in fact $A\times B\times C$ is actually well-defined, just like any other product, no matter how large the index-set $I$.
- It is still not true that $(A\times B)\times C = A\times (B\times C)$, and in fact both are still different from $A\times B\times C$. However, there are bijections between these three sets that are so obvious that for all practical purposes one may consider them to be equal.
Solution 2:
We want to define the most with as little as possible.
That way we only define what sets are, and by that we define ordered pairs, and so on.
The usual way is to define an ordered pair $\langle a,b\rangle = \{\{a\},\{a,b\}\}$. This is just because it's easy to work with.
You can define an ordered pair as the image of a function from the domain which is the power set of the power set of the empty set, the first element is the image of $\emptyset$ and the second is the image of $\{\emptyset\}$. (Yes, functions are usually defined as collections of ordered pairs. I'm talking about existence of a formula with two free variables.)
Again, these are just conventions and we work with that we find comfortable and as clear as possible.
As for the second issue, we only define pairs, but there is a natural identification between $\langle a,\langle b,c\rangle\rangle$ and $\langle a,b,c\rangle$ and of course $\langle\langle a,b\rangle, c\rangle$. So once again we only define as little as possible and somewhat abuse our own notation because we know that the formal backbone exists and is strong.
And lastly, as I said before, we want to define the most with as little as possible. In the world of set theory it's nice to have only sets. So we define $0=\emptyset$, and inductively we can define the natural numbers in terms of sets, so $n=\{0,\ldots,n-1\}$.