Linear regression for minimizing the maximum of the residuals

We know that simple linear regression will do the following thing:

Suppose there are $n$ data points $\{y_i,x_i\}$, where $i=1,2,\dots,n$. The goal is to find the equation of the straight line

$y=\alpha+\beta x$

which provides a best fit for the data points. Here "best" will be be understood as in the least-squares approach: such a line that minimizes the sum of squared residuals of the linear regression model. In other words, numbers $\alpha$ and $\beta$ solve the following minimization problem:

Find $\underset{{\alpha,\beta}}{\arg\min}\;Q(\alpha,\beta)$, where $Q(\alpha,\beta)=\sum\limits_{i=1}^n(y_i-\alpha-\beta x_i)^2$

My question is: if I want to minimize the following function, how to get $\alpha, \beta$:

$\underset{{\alpha,\beta}}{\arg\min}\;P(\alpha,\beta)$, where $P(\alpha,\beta)=\max\limits_{1\leq i\leq n} |y_i-\alpha-\beta x_i|$


You're asking about doing linear regression with the $L_{\infty}$ norm or, equivalently, the Chebyshev approximation criterion, rather than the usual $L_2$ norm that minimizes the sum of squared residuals.

There isn't a nice formula that will give you $\alpha$ and $\beta$. Instead, the standard approach is to obtain $\alpha$ and $\beta$ as the solution to a linear programming problem. The formulation is

$$\text{Minimize } r$$

subject to $$r - (y_i - \alpha - \beta x_i ) \geq 0, \text{ for each } i,$$ $$r + (y_i - \alpha - \beta x_i ) \geq 0, \text{ for each } i.$$ The variables are $r$ (the maximum residual), $\alpha$, and $\beta$, and the $(x_i, y_i)$ are the data values that become parameters in the LP formulation.

Here's an example, although the author assumes a model with $\alpha = 0$.


The minimization with respect to $\alpha$ is easy: Given $\beta$, we can form $\delta_i:=y_i-\beta x_i$; then the optimal value of $\alpha$ is halfway between the maximal and minimal values of $\delta_i$, and the corresponding value of $P$ is half the distance between the two. Let $i_>$ be an index (depending on $\beta$) for which the maximal value of $\delta_i$ is attained, and let $i_<$ be an index for which the minimal value of $\delta_i$ is attained; then

$$ \begin{eqnarray} \min_\alpha P(\alpha,\beta)&=&\frac{1}{2}(\delta_{i_>}-\delta_{i_<})\\ &=&\frac{1}{2}((y_{i_>}-\beta x_{i_>})-(y_{i_<}-\beta x_{i_<}))\\ &=&\frac{1}{2}((y_{i_>}-y_{i_<})-\beta(x_{i_>}-x_{i_<}))\;. \end{eqnarray} $$

Now assume that for a given $\beta$ there is only one index each for which the maximal and minimal values of $\delta_i$ are attained. Then in some sufficiently small neighbourhood of that value of $\beta$ these indices will not change as we vary $\beta$. That means we can decrease $\min_\alpha P(\alpha,\beta)$ by changing $\beta$ in the direction of the sign of $x_{i_>}-x_{i_<}$. Thus, such a value of $\beta$ cannot be the optimal one; that is, the optimal $\beta$ is one for which either the minimal or the maximal value of $\delta_i$ is attained for two different indices. In other words, the optimal $\beta$ is such that some line in the convex hull of the points $(x_i,y_i)$ has slope $\beta$.

Now as we increase $\beta$, $i_<$ increases from $1$ to $n$ and $i_>$ decreases from $n$ to $1$; the value of $\min_\alpha P(\alpha,\beta)$ decreases as these indices move towards each other and then increases again after they've passed each other. The optimal $\beta$ is one where one index for which $\delta_i$ is minimal lies between two indices for which $\delta_i$ is maximal, or vice versa. If the points are in general position, there will be a unique value of $\beta$ with this property.

[Edit:] I just noticed that in the last paragraph I assumed that the points are ordered with increasing $x_i$, but you hadn't assumed that. If they aren't ordered, replace "$i_\lt$ increases" by "$x_\lt$ increases" and "$i_\gt$ decreases" by "$x_\gt$ decreases".