What is a "linear function" in the context of multivariable calculus?
Solution 1:
A linear function in this context is a map $f: \mathbb{R}^n \to \mathbb{R}^m$ such that the following conditions hold:
- $f(x+y)=f(x)+f(y)$ for every $x,y \in \mathbb{R}^n$
- $f(\lambda x)=\lambda f(x)$ for every $x \in \mathbb{R}^n$ and $\lambda \in \mathbb{R}$.
It can be shown that every such function has the form $f(x)=Ax$ where $A \in \mathbb{R}^{m \times n}$ is an $m \times n$ matrix. If $f$ has the form $f(x)=Ax + b$ for some $b\in \mathbb{R}^m$, then it is called an affine linear function.
This generalises the notion of a linear map $f: \mathbb{R} \to \mathbb{R}$ of the form $f(x)=ax+b$, where $a,b$ are real numbers, which is probably what you had in mind. A linear affine map is a linear map, if and only if $b=0$. Note that your example is a special affine linear map from $\mathbb{R}^n \to \mathbb{R}^n$ (the dimensions have to match).
An example of a linear function from $\mathbb{R}^3$ to $\mathbb{R}^3$ would be $$f(x,y,z) = \begin{pmatrix} 1 & 2 & 7\\ 5& 3 & 7\\ 3& 8& 2 \end{pmatrix} \begin{pmatrix} x\\ y\\ z \end{pmatrix}.$$
Your example in the case of $\mathbb{R}^3$ is of the form
$$f(x,y,z) = \begin{pmatrix} a& 0 & 0\\ 0& a & 0\\ 0& 0& a \end{pmatrix} \begin{pmatrix} x\\ y\\ z \end{pmatrix} + \begin{pmatrix} b_x\\ b_y\\ b_z \end{pmatrix},$$ for some $a \in \mathbb{R}$ and $(b_x, b_y, b_z) \in \mathbb{R}^3$.
In the case of a differentiable function at a point $x_0 \in \mathbb{R}^m$ $f: \mathbb{R}^m \to \mathbb{R}^n$ we want to approximate the function by an affine linear map, that is locally around $x_0$ we have $$f(x) \approx A(x-x_0) + f(x_0),$$ where $A \in \mathbb{R}^{n \times m}$. The offset $f(x_0)$ ensures that the approximation takes the value $f(x_0)$ at the point $x_0$, and the matrix $A$ describes how the function changes linearly around $x_0$. The idea is that linear maps are really easy to handle using the tools of linear algebra.
Solution 2:
This is in general not the form of a linear function. A function $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ is linear if the following two equalities hold for all $\alpha\in\mathbb{R}$ and $x, y\in \mathbb{R}^n$:
$i)$ $f(x + y) = f(x) + f(y)$
$ii)$ $f(\alpha x) = \alpha f(x)$.
It turns out that all such functions are of the form $f(x) = Ax$ for some matrix $A\in\mathbb{R}^{m\times n}$ (that is, a matrix with $m$ rows, $n$ columns).
One key difference with your proposed form is that linear functions always go through the origin, that is $f(0) = 0$, where $0$ is the zero vector (rather than the scalar). This is not the case if $b\neq 0$ in your proposed form. For $f: \mathbb{R}^2 \rightarrow \mathbb{R}$ you should think of a plane through the origin as the graph, rather than a line.
Solution 3:
I just want to check that linear functions from $\mathbb{R}^n$ to $\mathbb{R}^m$, are defined as functions of the form $f(x)=ax+b$ where a is a scalar and b is a vector?
No. In fact, a linear function is one with the property that $f(ax) = af(x)$ for any $x$ is whatever vector space it's defined on and any $a$ in the scalar field of that vector space. In that case, that is precisely those of the form $f(x) = Ax$ for some matrix $A$.
Also, it seems like functions of the form above just enlarge/shrink and shift. Is this correct?
No, because of the above. For an example involving a circle, take $n = 2$, $m = 2$ and $A = \left(\array{2&0\\0&1}\right)$. This turns the unit circle into an ellipse. More generally, note that $n$ and $m$ do not have to be the same. For example, there's the linear map \begin{align*}f&: \mathbb{R}^3\to\mathbb{R}\\&:\left(\array{x\\y\\z}\right)\mapsto x+y+z,\end{align*} which collapses everything down to a diagonal line (but not in the most "natural" way).
Solution 4:
In general, the derivative is the best local linear approximation to a function at a point. A differentiable function $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ at $x=x_0$ is locally approximated by a vector space homomorphism $Df_{x_0} \in {\cal L}(\mathbb{R}^n, \mathbb{R}^m)$, and it is in this sense that you must understand "linear".
In the direction $v \in \mathbb{R}^n$, the directional derivative is simply $Df_{x_0}(v)$ because the derivative contains all information about all local rates of change in all directions.
Basically what happens is that you attach a copy of $\mathbb{R}^{m+n}$ to $x_0$, and you approximate the curvy graph of $f$ by the flat (linear) graph of $Df(x_0)$. This is called the tangent space to the graph of $f$ at $x=x_0$. If you balance a piece of cardboard on a beach ball, you have a good model for this. The origin is where the cardboard touches the ball, which is why you don't get an additive constant.
If you draw a line on your piece of cardboard through the point where it touches, you get a model for the directional derivative in the direction of your point. Rotate your cardboard tangent plane around that point, and you get different directional derivatives.