Why is the determinant the volume of a parallelepiped in any dimensions?

If the column vectors are linearly dependent, both the determinant and the volume are zero. So assume linear independence. The determinant remains unchanged when adding multiples of one column to another. This corresponds to a skew translation of the parallelepiped, which does not affect its volume. By a finite sequence of such operations, you can transform your matrix to diagonal form, where the relation between determinant (=product of diagonal entries) and volume of a "rectangle" (=product of side lengths) is apparent.

Here is the same argument as Muphrid's, perhaps written in an elementary way.

Apply Gram-Schmidt orthogonalization to $\{v_{1},\ldots,v_{n}\}$, so that \begin{eqnarray*} v_{1} & = & v_{1}\\ v_{2} & = & c_{12}v_{1}+v_{2}^{\perp}\\ v_{3} & = & c_{13}v_{1}+c_{23}v_{2}+v_{3}^{\perp}\\ & \vdots \end{eqnarray*} where $v_{2}^{\perp}$ is orthogonal to $v_{1}$; and $v_{3}^{\perp}$ is orthogonal to $span\left\{ v_{1},v_{2}\right\} $, etc.

Since determinant is multilinear, anti-symmetric, then \begin{eqnarray*} \det\left(v_{1},v_{2},v_{3},\ldots,v_{n}\right) & = & \det\left(v_{1},c_{12}v_{1}+v_{2}^{\perp},c_{13}v_{1}+c_{23}v_{2}+v_{3}^{\perp},\ldots\right)\\ & = & \det\left(v_{1},v_{2}^{\perp},v_{3}^{\perp},\ldots,v_{n}^{\perp}\right)\\ & = & \mbox{signed volume}\left(v_{1},\ldots,v_{n}\right) \end{eqnarray*}

In 2d, you calculate the area of a parallelogram spanned by two vectors using the cross product. In 3d, you calculate the volume of a parallelepiped using the triple scalar product. Both of these can be written in terms of a determinant, but it's probably not clear to you what the proper generalization is to higher dimensions.

That generalization is called the wedge product. Given $n$ vectors $v_1, v_2, \ldots, v_n$, the wedge product $v_1 \wedge v_2 \wedge \ldots \wedge v_n$ is called an $n$-vector, and it has as its magnitude the $n$-volume of that $n$-parallelepiped.

What is the relationship between the wedge product and the determinant? Quite simple, actually. There is a natural generalization of linear maps to work on $k$-vectors. Given a linear map $\underline T$ (which can be represented as a matrix), the action of that map on a $k$-vector is defined as

$$\underline T(v_1 \wedge v_2 \wedge \ldots \wedge v_k) \equiv \underline T(v_1) \wedge \underline T(v_2) \wedge \ldots \wedge \underline T(v_k)$$

When talking about $n$-vectors in an $n$-dimensional space, it's important to realize that the "vector space" of these $n$-vectors is in fact one-dimensional. That is, if you think about volume, there is only one such unit volume in a given space, and all other volumes are just scalar multiples of it. Hence, when we talk about the action of a linear map on an $n$-vector, we can see that

$$\underline T(v_1 \wedge v_2 \wedge \ldots \wedge v_n) = \alpha [v_1 \wedge v_2 \wedge \ldots \wedge v_n]$$

for some scalar $\alpha$. In fact, this is a coordinate system independent definition of the determinant!

When you build a matrix out of $n$ vectors $f_1, f_2, \ldots, f_n$ as the matrix's columns, what you're really doing is the following: you're saying that, if you have a basis $e_1, e_2, \ldots, e_n$, then you're defining a map $\underline T$ such that $\underline T(e_1) = f_1$, $\underline T(e_2) = f_2$, and so on. So when you input $e_1 \wedge e_2 \wedge \ldots \wedge e_n$, you get

$$\underline T(e_1 \wedge e_2 \wedge \ldots \wedge e_n) = (\det \underline T) e_1 \wedge e_2 \wedge \ldots \wedge e_n= f_1 \wedge f_2 \wedge \ldots \wedge f_n$$

This is how you can use a matrix determinant to calculate volumes: it's just an easy way of constructing something that automatically computes the wedge product.

Edit: how one can see that the wedge product accurately gives the volume of a parallelepiped. Any vector can be broken down into perpendicular and parallel parts with respect to another vector, to a plane, and so on (or to any $k$-vector). As such, if I have two vectors $a$ and $b$, then the wedge product $a \wedge b = a \wedge b_\perp$, where $b_\perp$ is effectively the height of the parallelogram. Similarly, if I construct a parallelepiped with a vector $c$, then the wedge product $a \wedge b \wedge c = (a \wedge b_\perp) \wedge c_\perp$, where $c_\perp$ lies entirely normal to $a \wedge b_\perp$. So we can recursively do this for any $k$-vector, looking at orthogonal vectors instead, which is much simpler to see the volumes from.

Can a limit of an integral be moved inside the integral?

Prove that $\left(\sum^n_{k=1}x_k\right)\left(\sum^n_{k=1}y_k\right)\geq n^2$

Proof of general Euclid's Lemma in a UFD (by Euclidean algorithm?)

Finding the error in this proof that 1=2

How can we prove that among positive integers any number can have only one prime factorization? [closed]

Need to prove that $(S,\cdot)$ defined by the binary operation $a\cdot b = a+b+ab$ is an abelian group on $S = \Bbb R \setminus \{-1\}$.

Given 4 integers, $a, b, c, d > 0$, does $\frac{a}{b} < \frac{c}{d}$ imply $\frac{a}{b} < \frac{a+c}{b+d} < \frac{c}{d}$?

Why $\lim\limits_{n\to \infty}\left(1+\frac{1}{n}\right)^n$ doesn't evaluate to 1?

Intuition behind euler's formula [duplicate]

Arcwise connected part of $\mathbb R^2$

Infinite DeMorgan laws

Question about the dirac $\delta$-function