Motivation behind the definition of Banach-Mazur Distance
Solution 1:
I will try to give a more geometric explanation. First note that you can always scale $T$ such that it has norm one. Hence, the Banach-Mazure distance can be rewritten as:
$$ d(X,Y)=\inf\{||T^{-1}||: T\in GL(X,Y), ||T||=1\} $$
Geometrically, $||T||=1$ means that $T(B_X)\subseteq B_Y$ and no enlargement of $T(B_X)$ will still fit inside $B_Y$ ($B_X$ and $B_Y$ represent the unit balls of the two spaces). On the other hand, we have that
$$ T^{-1}(B_Y)\subseteq||T^{-1}||B_X $$
or equivalently
$$ B_Y\subseteq||T^{-1}||T(B_X) $$
Threfore:
$$ T(B_X)\subseteq B_Y\subseteq||T^{-1}||T(B_X) $$
Thus, geometrically, $||T^{-1}||$ represent the smallest amount by which you must increase $T(B_X)$ such that it contains $B_Y$. The Banach-Mazur distance represents the infimum of such enlargments, taken over all linear isomorphism that send $B_X$ inside $B_Y$.
For a perhaps a better intuitive understanding, take $B_X$ to be the unit sphere. Then for any isomorphism $T$, $T(B_X)$ is going to be an ellipsoid. For the banach-Mazur distance, you are looking for the "best fit" ellipsoid. That is, you are looking for the ellipsoid that fits inside $B_Y$ (touching the boundary), such that the enlargement required for this ellipsoid to contain $B_Y$ is as small as possible.
Yes, there is a similar notion for infinite dimensional Banach spaces, with the convention that when $X$ and $Y$ are not isomorphic, $d(X,Y)=\infty$.
Solution 2:
For an invertible linear map $T: X \rightarrow Y$, the quantity $\|T\| \|T^{-1}\|$ is called the condition number of the operator, and measures the proportional error accrued by output in $Y$ of $T$, when an error is introduced to the argument in $X$. See here for details. The errors are measured in the respective norms of each of $X$ and $Y$, so the quantity $\|T\| \|T^{-1}\|$ measures how norm "stable" the process of passing through $X$ to $Y$ via the isomorphism $T$ is. In numerical analysis, especially, minimizing the condition number is always desired, as it means that if your computer introduces some small roundoff error or something, then this will only lead to a small error in the final result.
In a more theoretical context, by taking the infimum over all isomorphisms $T: X \rightarrow Y$, we are looking for the optimal constants used to relate the norms in each space; in other words, how close to an isometry can we get? In your situation, for example, we know that for $\mathbb{C}^n$, given the $\ell_p$ and $\ell_q$ norms and isomorphism $T$, we can find optimal estimates like $\|x\|_p \le C\|Tx\|_q$, but $C$ can never be exactly 1.
As for your last question, I don't think there's any restriction on taking either finite or infinite dimensional normed linear spaces, though I don't claim to be an expert on such matters. Of course, if the spaces are infinitely-dimensional, you have to ensure that they really are isomorphic.