Gradient of a Vector Valued function
$\def\R{{\bf R}}$It's partly an issue of naming. The gradient is most often defined for scalar fields, but the same idea exists for vector fields - it's called the Jacobian.
Taking the gradient of a vector valued function is a perfectly sensible thing to do. You just don't usually call it the gradient.
A neat way to think about the gradient is as a higher-order function (i.e. a function whose arguments or return values are functions). Specifically, the gradient operator takes a function between two vector spaces $U$ and $V$, and returns another function which, when evaluated at a point in $U$, gives a linear map between $U$ and $V$.
We can look at an example to get intuition. Consider the scalar field $f:\R^2\to\R$ given by
$$f(x,y) = x^2+y^2$$
The gradient $g=\nabla f$ is the function on $\R^2$ given by
$$g(x,y) = \left(2x, 2y\right)$$
We can interpret $(2x,2y)$ as an element of the space of linear maps from $\R^2$ to $\R$. I will denote this space $L(\R^2,\R)$.
Therefore $g=\nabla f$ is a function that takes an element of $\R^2$ and returns an element of $L(\R^2,\R)$. Schematically,
$$g: \R^2 \to L(\R^2 ,\R)$$
This means that $\nabla$ should be interpreted as a higher-order function
$$\nabla : (\R^2 \to \R) \to (\R^2 \to L(\R^2, \R))$$
There's nothing special about $\R^2$ and $\R$ here. The construction works for any vector spaces $U$ and $V$, giving
$$\nabla : (U\to V) \to (U \to L(U,V))$$
A good reference for this way of thinking about the gradient is Spivak's book Calculus on Manifolds.
Sure enough a vector valued function ${\bf f}$ can have a derivative, but this derivative does not have the "type" of a vector, unless the domain or the range of ${\bf f}$ is one-dimensional. The general setup is the following: Given a function $${\bf f}:\quad{\mathbb R}^n\to{\mathbb R}^m,\qquad {\bf x}\mapsto {\bf y}={\bf f}({\bf x})$$ and a point ${\bf p}$ in the domain of ${\bf f}$ the derivative of ${\bf f}$ at ${\bf p}$ is a linear map $d{\bf f}({\bf p})=:L$ that maps the tangent space $T_{\bf p}$ to the tangent space $T_{\bf q}$, where ${\bf q}:={{\bf f}({\bf p})}$. The matrix of $L$ with respect to the standard bases is the Jacobian of ${\bf f}$ at ${\bf p}$ and is given by $$\bigl[L\bigr]=\left[{\partial y_i\over\partial x_k}\right]_{1\leq i\leq m,\ 1\leq k\leq n}\ .$$ If $m=1$, i.e., if ${\bf f}$ is in a fact a scalar function, then the matrix $\bigl[L\bigr]$ has just one row (of length $n$): $$\bigl[L\bigr]=\bigl[{\partial f\over\partial x_1} \ {\partial f\over\partial x_2}\ \ldots\ {\partial f\over\partial x_n}\bigr]_{\bf p}\ .$$ The $n$ entries of this one-row matrix can be viewed as coordinates of a vector which is then called the gradient of $f$ at ${\bf p}$.