Is this solution to the fifth question in PRIMES 2022 valid?

This is the fifth problem from PRIMES $2022$:

(i) Describe an algorithm to find the closed ball (disk) of smallest radius containing a given finite set of points $(x_i, y_i), i = 1, \ldots, n$, in $\mathbb{R}^2$.
(ii) Do the same for points $(x_i, y_i, z_i), i = 1, \ldots, n$, in $\mathbb{R}^3$.
(iii) Show that the ball in (i),(ii) is unique.


My solution

For the first part, define $C(x_{n+1}, y_{n + 1})$ to be the $n+1$'th point such that $$x_{n+1} = \frac{1}{n}\sum_{i=1}^n x_i, \qquad y_{n+1} = \frac{1}{n}\sum_{i=1}^n y_i.$$

To get the radius, we need to find the farthest point $P$ from $C$ by searching for points $M_x$ and $M_y$ by finding points with maximum horizontal and vertical distance from the centroid. There will be two candidates, and the farthest point will be checked by using the Pythagorean theorem.

Here is a sample Python code for this procedure:

import numpy as np

# Definitions

def startsearch():
    # Replace x_i and y_i with appropriate values
    x_data_set = [x_1, x_2, ..., x_n]
    y_data_set = [y_1, y_2, ..., y_n]

    c_x = np.abs(int(sum(x_data_set)) / int(len(x_data_set)))
    c_y = np.abs(int(sum(y_data_set)) / int(len(y_data_set)))

    ind_x, ind_y = int(0), int(0)
    max_x, max_y = int(0), int(0)

    for i in range(0, len(x_data_set)):
        temp_dx = np.abs(x_data_set[i] - c_x)
        if max_x < temp_dx:
            max_x = temp_dx
            ind_x = int(i)

    for j in range(0, len(y_data_set)):
        temp_dy = np.abs(y_data_set[j] - c_y)
        if max_y < temp_dy:
            max_y = temp_dy
            ind_y = int(j)

    cand_1r = (c_x - x_data_set[ind_x]) * (c_x - x_data_set[ind_x]) + (c_y - y_data_set[ind_x]) * (c_y - y_data_set[ind_x])
    cand_2r = (c_x - x_data_set[ind_y]) * (c_x - x_data_set[ind_y]) + (c_y - y_data_set[ind_y]) * (c_y - y_data_set[ind_y])

    if cand_1r >= cand_2r:
        print("Point", ind_x + 1, "is the farthest from the centroid")
        print("Radius is", np.sqrt(cand_1r))
    else:
        print("Point", ind_y + 1, "is the farthest from the centroid")
        print("Radius is", np.sqrt(cand_2r))

if __name__ == '__main__':
    startsearch()

For the second part, this can be done in a similar as the first part. The adjustments will be:

  1. The centroid $C$ of the data set which will now be $$x_{n+1} = \frac{1}{n}\sum_{i=1}^n x_i, \qquad y_{n+1} = \frac{1}{n}\sum_{i=1}^n y_i, \qquad z_{n+1} = \frac{1}{n}\sum_{i=1}^n z_i.$$
  2. The search for candidates for the farthest point will be to find which points from the set have the maximum $x$, $y$, and $z$ distance from the centroid.
import numpy as np

# Definitions

def startsearch():
    # Replace x_i, y_i, and z_n with appropriate values
    x_data_set = [x_1, x_2, ..., x_n]
    y_data_set = [y_1, y_2, ..., y_n]
    z_data_set = [z_1, z_2, ..., z_n]

    c_x = np.abs(int(sum(x_data_set)) / int(len(x_data_set)))
    c_y = np.abs(int(sum(y_data_set)) / int(len(y_data_set)))
    c_z = np.abs(int(sum(z_data_set)) / int(len(z_data_set)))

    ind_x, ind_y, ind_z = int(0), int(0), int(0)
    max_x, max_y, max_z = int(0), int(0), int(0)

    for i in range(0, len(x_data_set)):
        temp_dx = np.abs(x_data_set[i] - c_x)
        if max_x < temp_dx:
            max_x = temp_dx
            ind_x = int(i)

    for j in range(0, len(y_data_set)):
        temp_dy = np.abs(y_data_set[j] - c_y)
        if max_y < temp_dy:
            max_y = temp_dy
            ind_y = int(j)

    for k in range(0, len(z_data_set)):
        temp_dz = np.abs(z_data_set[k] - c_z)
        if max_z < temp_dz:
            max_z = temp_dz
            ind_z = int(k)

    cand_1r = (c_x - x_data_set[ind_x]) * (c_x - x_data_set[ind_x]) + (c_y - y_data_set[ind_x]) * (c_y - y_data_set[ind_x]) + (c_z - z_data_set[ind_x]) * (c_z - z_data_set[ind_x])
    cand_2r = (c_x - x_data_set[ind_y]) * (c_x - x_data_set[ind_y]) + (c_y - y_data_set[ind_y]) * (c_y - y_data_set[ind_y]) + (c_z - z_data_set[ind_y]) * (c_z - z_data_set[ind_y])
    cand_3r = (c_x - x_data_set[ind_z]) * (c_x - x_data_set[ind_z]) + (c_y - y_data_set[ind_z]) * (c_y - y_data_set[ind_z]) + (c_z - z_data_set[ind_z]) * (c_z - z_data_set[ind_z])

    if cand_1r >= cand_2r:
        if cand_1r >= cand_3r:
            print("Point", ind_x + 1, "is the farthest from the centroid")
            print("Radius is", np.sqrt(cand_1r))
        else:
            print("Point", ind_z + 1, "is the farthest from the centroid")
            print("Radius is", np.sqrt(cand_3r))
    else:
        if cand_2r >= cand_3r:
            print("Point", ind_y + 1, "is the farthest from the centroid")
            print("Radius is", np.sqrt(cand_2r))
        else:
            print("Point", ind_z + 1, "is the farthest from the centroid")
            print("Radius is", np.sqrt(cand_3r))

if __name__ == '__main__':
    startsearch()

For the third part, this just means that we have to prove that the average of a data set is unique. This is because if $\mu$ is the "true" average and both $\mu = \mu_1$ and $\mu = \mu_2$ are true, then $\mu_1 = \mu_2$.


Problems arising

  1. Is the radius of the closed ball the smallest compared to other closed balls containing the set of points?

For $n = 3$, this is not the best solution as there is a unique circle passing in the points having a radius smaller than my solution. However, it works if the points are equidistant. This is because the centroid is also the circumcenter. There are also cases where the radius as obtained through this solution, is significantly smaller when compared to the radius passing through the points. Try $\{(0,0), (1,0), (3,1)\}$.

For $n \geq 3$, I don't have an idea whether my solution produces the best closed ball. If the points on the convex hull are not concyclic, a search for a circle passing through three points on the convex hull containing all the points will be needed which, I think, will cost more.

  1. Does the code find the farthest point accurately?

The farthest point is always has the maximum distance to the centroid, horizontally and vertically, which means it should find the farthest point accurately.


Main question

Is this solution valid? If so, is this the best one, or is there a better way that produces a smaller radius than this one?


Edits:

  1. I deleted it because it is still not past the deadline. I thought it's okay since it's 12/1 here, UTC+8. I didn't realize the time zone difference at that time.

  2. Before, the procedure is to find $|x_i|_{\text{max}}$. Now, the procedure is to find $|x_i - C_x|_\text{max}$.


Solution 1:

I think this approach fundamentally does not work on a number of levels. Let's just look at the $\mathbb{R}^2$ case. Consider the collection of points $(-1,1),(1,-1),(0,1.1),(1.1,0)$ followed by 163 points that are identical (or very close to) $(0.9,0.9)$. The disk which encompasses all of these points is a disk centered at the origin with radius $\sqrt{2}$, defined by the points $(-1,1),(1,-1)$. Ideally, your algorithm would identify these points and do something with them. However, the average of the datapoints is about $(0.9,0.9)$, even though those points are useless. Similarly, the maximum absolute value of $x$ occurs in the point $(1.1,0)$, and the absolute value of $y$ occurs in the point $(0,1.1)$. But again, both of these points are useless.

Ideally, your algorithm should identify three specific points which it can use to create a triangle in 2D. (In this case to get the third point you can just repeat one of the other two.) In 3D, you want to find 4 points which form a tetrahedron, where again repeat points are allowed.

Solution 2:

I apologize for the delay in writing this answer. I was writing one specific to the problem, but it turns out that a more general question can be answered with better tools, and if it's a PRIMES problem then it's better to have some general discussion around the construction of the problem as well :along with the answer.


Norm optimizers

Let us define , for $a = (a_1,...,a_m)$ and $b = (b_1,...,b_m)$ which are elements in $\mathbb R^m$, the "Euclidean" distance $d(a,b) = \left(\sum_{i=1}^n (a_i-b_i)^2\right)^{\frac 12}$. So if you imagine $a,b$ as points in $\mathbb R^m$,this is the usual distance between them.

We define for $w \in \mathbb R^k$ and $r>0$, the (closed) ball $B(w,r) = \{w' \in \mathbb R^k : d(w,w') \leq r\}$.

Now, given a collection of points $\{x_1,...,x_n\} \in \mathbb R^m$ and a point $y \in \mathbb R^m$, one can define various notions of how the point $y$ "approximates" the collection $\{x_1,...,x_n\}$. That is, we usually want $y$ to the best approximation to the set $x_1,...,x_n$ (you can think of this as trying to represent a collection of point masses as a single point mass, and trying to figure out where to place the mass).

For this , what we do is create a vector out of these quantities, so we have the vector $v\in \mathbb R^n$ with entries $v_i = d(y,x_i)$. We can then create functions of this vector so that we can decide how good $y$ is, purely on the basis of its distances to the various $x_i$.

The most common approximations are what as called as the $L^p$ approximations, for $0 \leq p \leq \infty$. In brief, we have the following definitions.

  • If $p=0$, then $\|v\|_0 = \left|\{1 \leq i \leq n : v_i \neq 0\}\right|$.

  • If $0<p<\infty$ then $\|v\|_p = (\sum_{i=1}^n v_i^p)^\frac 1p$

  • If $p=\infty$ then $\|v\|_{\infty} = \max\{v_i , 1 \leq i \leq n\}$.

The $L^0$ norm finds the number of non-zero elements in $v$.

The $L^p$ norms capture two of the most important approximation norms, namely $L^1$ and $L^2$.

The $L^\infty$ norm actually corresponds to our problem.

All these norms are useful in data science for various kinds of approximations. That's half the motivation for studying the general problem.


More on the norms : and their purpose

The point , now , is quite simple : once you fix a $0 \leq p \leq \infty$ based on your application, then any point $y$ such that $y = \displaystyle\mbox{argmin}_{y \in \mathbb R^m} \|v\|_p$ ( that is, $y$ minimizes $\|v\|_p$ over all possible points in $\mathbb R^m$) can be called as a $p$-approximation to the set $\{x_1,...,x_n\}$.

This notion has plenty of relevance. I'll provide brief explanations : some of which is here.

  • The minimizer of the $L^0$ norm is often a solution of a linear equation , that has the most number of zeros and hence is the easiest to process. It's a very useful norm in data science for this purpose.

  • Any $L^1$ norm minimizer, when $m=1$, is called the median, and the (unique) $L^2$ norm minimizer is called the mean and is given by the formula $y = \frac{x_1+\ldots + x_m}{m}$ (where addition and division by $m$ is component-wise as vectors of $\mathbb R^m$). These are obviously rich notions of approximation.

  • The $L^\infty$ norm minimization, is our problem. Indeed, for each $y$, the smallest $r>0$ such that $x_1,...,x_m \in B(y,r)$ is obviously given by the quantity $\|v\|_{\infty}$. So, what our problem asks for, as a matter of fact, is the minimization of the $L^\infty$ norm. The fact that it's been asked as a PRIMES question should be enough motivation for us to explore the situation.


Convex optimization

Theorems that guarantee the existence of minimizers of functionals are few and far between. We happen to fall into a very special class of functionals, however.

For $p \geq 1$, the function $v \to \|v\|_p$ from $\mathbb R_{\geq 0}^m$ (that is, vectors with non-negative entries) to $\mathbb R$ is a convex function i.e. for all $v,w \in \mathbb R_{\geq 0}^m$ and $\lambda \in [0,1]$ we have $$ \|(1-\lambda)v+\lambda w\|_p \leq (1-\lambda) \|v\|_p + \lambda \|w\|_{p} $$

Furthermore, if $p>1$, then the given function is strictly convex i.e. if $v \neq w$ and $\lambda \in (0,1)$ then the $\leq$ above changes to a strict $<$.

We can compose this with the map from $y$ to $v$ and verify the following as well : $$ h_p : \mathbb R^m \to \mathbb R , h_p(y)=\|(d(y,x_i))\|_p $$ is a convex function, and strictly convex for $p>1$.

One very nice result is this : any strictly convex function attains at most one local minimum. Furthermore, if it does attain a local minimum, then that point is also a global minimum.

Unfortunately, this result doesn't hold for $p=\infty$ : but it's still worth discussing in relation to what we've studied.


The answer for $\infty > p>1$ : and an interesting observation

It turns out, following gradient analysis, that we can construct an argument for $L^{p}$ norm minimization for $1<p<\infty$.

How? Well, the first thing to realize is that it is equivalent to minimize the quantity $h_p(y)^p$ instead of $h$, for $1<p<\infty$. We have $$ \frac{\partial}{\partial y} \sum_{i=1}^n d(y,x_i)^p = \sum_{i=1}^n p d(y,x_i)^{p-1} \frac{y-x_i}{d(y,x_i)} = \sum_{i=1}^n p (y-x_i) d(y,x_i)^{p-2} $$

At all points $y$ which are not equal to the $x_i$ for some $i$. Try putting $p=2$ and seeing where the "average" comes from.

It is now possible to perform a gradient descent algorithm to find $y$. To give a brief explanation : the gradient at a point indicates the direction in which, if one proceeds from that point, one gets the highest possible increase of the function. It follows that if one is performing minimization, then one can head in the direction opposite to the gradient, and get to a point where the minimum occurs.

So an algorithm can be proposed for $L^p$ minimization as follows :

  • Let $y_1$ be any point (you can start with $y_1 = \frac 1n \sum_{i=1}^n x_i$). Let $s_1$ be any "step" size.

  • Find the gradient $\frac{\partial}{\partial y} h_p^p$ at the point $y_1$, using the formula for it specified earlier. If it equals zero, then $y_1$ is the minimum. Otherwise, let $y_2 = y_1 - s_1 \frac{\partial}{\partial y} h_p^p(y) \big|_{y = y_1}$.

  • Update the step size $s_2$ and let $y_1=y_2,s_1=s_2$, go back to step $1$.

If the step sizes are small, then we can ensure that $h_p^p(y_i)$ is a decreasing sequence , heading towards the minimum. Thus, we have a gradient descent algorithm for solving the $L^p$ minimization problem.

This also explains why the algorithm proposed in the question doesn't work : repeated averaging continues to work with the $L^2$ norm all the time, while we require $L^{\infty}$ norm estimates. That's why $p=\infty$ requires a different approach.

What happens when we do this for $p=\infty$ though? The key observation is that the gradient is actually indicative of something quite surprising in this case. Indeed, the gradient of the function $v \to \|v\|_{\infty}$ is a bit weird. It isn't defined when two components of $v$ both attain the maximum values over all components (e.g. for $(3,2,3)$ where the $3$ is attained by two components), it's actually defined when exactly one component attains the maximum (e.g. for $(3,2,2)$ or $(4,1,3)$). If component $i$ uniquely attains the maximum for $v$, then the gradient of $\|v\|_{\infty}$ at $v$ is actually $e_i$, the vector with zeros everywhere except $1$ in the $i$th position.

This leads to the rather awkward form of the gradient of $h_{\infty}$ : given a $y$, the gradient of $y$ is defined only at the points where $d(y,x_i)$ is attained uniquely by one $i$, and $y \neq x_i$ for any $i$. In that case, using the chain rule, it is seen to be the vector $\frac{y-x_i}{d(y,x_i)}$.

But that's quite remarkable : this quantity , it's never zero, because $y \neq x_i$ for any $i$. So, the gradient can't be zero at any point where it exists!

That's when we receive a suggestion, that perhaps, perhaps the minimum is actually attained at a point where the gradient is NOT defined. That is, perhaps a point $y$ minimizing the $L^{\infty}$ norm is actually either one of the $x_i$, or a point such that the maximum of the quantities $d(y,x_i)$ is attained by more than one $x_i$.

Armed with this "gradient-type" logic, we can actually solve the entire problem, as I will show in the two sections below.


Result on $L^{\infty}$ minimization

Theorem : Let $\{x_1,...,x_n\}$ be a collection of distinct points. Then, the function $h_{\infty}(y) = \max{d(y,x_i)}$ attains its minimum either at one of the points $x_i$ for $1 \leq i \leq n$, or at a point $a$ such that $d(a,x_i) = d(a,x_j)$ for some two distinct indices $i \neq j$, and $d(a,x_k) < d(a,x_i)$ for all $j \neq i,k$.

Proof : Suppose that $y$ is a point that does not satisfy the conditions given above. Let $i$ be the unique index so that $M = d(y,x_i) = \max \{d(y,x_j)\}$. Let $\delta = M - \sup_{j \neq i} d(y,x_j) > 0$ by assumption. Now, define a new point $y' = y - \delta\frac{y-x_i}{4M}$. Note that $d(y',x_i) = M - \frac{\delta}{4}$ because of the fact that $y'$ lies on the line joining $y$ and $x_i$. However, for $j \neq i$, we have $d(y',x_j) \leq d(y',j) + d(y,y') \leq M-\frac{3\delta}{4}$. It follows that $h_{\infty}(y') = M - \frac{\delta}{4} < M = h_{\infty}(y)$. Therefore, $y$ cannot be a point of minimum. $\blacksquare$

Here, we used the gradient to work out the direction of decrease and create $y'$.

We are now ready to solve the PRIMES problem.


Solving the PRIMES problem

Now, the characterization for the cases $m=2,m=3$ lies in the fact that if I fix two distinct points $x_i,x_j$ , the set of points $x$ such that $d(x,x_i) = d(x,x_j)$ has different structures. If $m=2$, then it's the perpendicular bisector of the line joining $x_i,x_j$. If $m=3$ then it's the hyperplane that is perpendicular to the line joining $x_i,x_j$ which also contains the midpoint of that line segment. In short : the set of points is a line if $m=2$ and a plane if $m=3$.

We can use the structure of these objects to make stronger conjectures, that immediately translate to algorithms fit for purpose.

Theorem : Suppose that $m=2$, and $x_1,...,x_n$ are distinct points in $\mathbb R^2$. Then, the $L^{\infty}$ minimizer $y$ is either :

  • One of the $x_i$.

  • The midpoint of the line segment joining two distinct points $x_i,x_j$.

  • The circumcenter of the circumcircle of three distinct points $x_i,x_j,x_k$.

Proof : Suppose that $y$ is a point not satisfying any of the above conditions. If there don't exist distinct $i,j$ such that $d(y,x_i) = d(y,x_j) = \max\{d(y,x_k)\}$ then we've already seen that $y$ cannot be an $L^\infty$ minimizer. So, $y$ must satisfy this condition. In particular, $y$ lies on the perpendicular bisector joining $x_i$ and $x_j$.

Now, if there was a $k \neq i,j$ such that $d(y,x_k) = d(y,x_i)= d(y,x_j)$ then we'd be done, because $y$ would be the circumcenter of these three points. If there is no such $k$, then we know that $d(y,x_i) - \sup_{k \leq i,j} d(y,x_k) = \delta > 0$.

What we do is use the gradient idea again, but this time we stay on the perpendicular bisector. Remember that if we have two points $x_i,x_j$ and a point $a$ on the perpendicular bisector of $x_i,x_j$, then the quantity $d(a,x_i) = d(a,x_j)$ is smaller when $a$ is closer to the midpoint of $x_i$ and $x_j$. Now , if $y$ is already the midpoint of $x_i$ and $x_j$, then we are done. Otherwise, if not, then let $y' = y + \frac{\delta}{4}(y - \frac{x_1+x_2}{2})$. That is, we move towards the midpoint $\frac{x_1+x_2}{2}$ by a distance of $\frac{\delta}{4}$, while staying on the perpendicular bisector.

Now, because of the fact we moved closer to the midpoint, we get $d(y',x_i) < d(y,x_i)$ and likewise $d(y',x_j) < d(y,x_j)$. By the triangle inequality, $d(y',x_i) = d(y',x_j) > d(y,x_i) - \frac{\delta}{4}$. However, we also get that $d(y',x_k) < d(y,x_i) - \frac{3\delta}{4}$ for $k \neq i,j$ by the same triangle inequality.

This implies that $h_{\infty}(y') < h_{\infty}(y)$. Therefore, $y$ cannot be a minimizer.

It follows that $\delta=0$ i.e. that there is a $k$ such that $d(y,x_k) = d(y,x_i) = d(y,x_j)$. Thus, $y$ is a circumcenter of a circle. $\blacksquare$

Now, it should be quite easy to find the answer for $m=2$. Consider every $x_i$ , every midpoint, and every possible circumcenter. Then, find $h_{\infty}$ at each point. The smallest value, is the minimizer.

The answer for $m=3$ is analogous : I'll sketch it out.

Theorem : Suppose that $x_1,...x_n$ is a collection of distinct points in $\mathbb R^3$. Then, the $L^{\infty}$ minimizer is either :

  • One of the $x_i$.

  • A midpoint of the line segment joining two points $x_i,x_j$.

  • A circumcenter of the triangle formed by three vertices $x_i,x_j,x_k$.

  • A circumcenter of the circumsphere determined by four points $x_i,x_j,x_k,x_l$.

Proof : We already know that the minimizer, if it is not one of the above, must lie on a hyperplane determined by the midpoint of two points $x_i,x_j$. Once it is on this hyperplane, assuming it's not already a midpoint, we can travel in a direction that minimizes the distance between the two points $x_i,x_j$ but doesn't increase the other distances enough : that leads to $y$ lying in a set of points $d(y,x_i)=d(y,x_j)= d(y,x_k)$ for some $k$.

That's the line (we are in three dimensions) on which the circumcenter of the triangle $x_i,x_j,x_k$ lies. If $y$ isn't the circumcenter, then travelling towards the circumcenter will decrease the distances to the points $x_i,x_j,x_k$ but not increase the other distances enough. In this way, we arrive at a single point where $d(y,x_i)$ is the same for at least four different points i.e. $y$ is the circumcenter of the hypersphere determined by these points. $\blacksquare$

Once again, the point is to find $h_{\infty}$ at these finitely many points and get the answer.


Uniqueness

I'll leave you with a hint, which you can use to complete the proof of :

Theorem : The $L^\infty$ minimization problem is unique.

Hint : Let $y_1,y_2$ be two distinct minimizers, and consider any point $y$ on the line joining $y_1,y_2$. We claim that $d(y,x_i) \leq \max\{d(y_1,x_i), d(y_2,x_i)\}$ for all $i$. Indeed, the idea is that if fix a point $x_i$ and consider the line joining $y_1$ and $y_2$, then as we move from $y_1$ to $y_2$ along this line and analyse the distance to $x_i$, we know that the distance must either be decreasing, increasing, or decreasing then increasing. In every possible case, for every single point $y$ in the middle we either have $d(y,x_i) \leq d(y_1,x_i)$ or $d(y,x_i) \leq d(y_2,x_i)$.

So, if $y_1,y_2$ are minimizers then so is every point on the line segment joining these points. Something has to give at this point.