Confusion related to curse of dimensionality in k nearest neighbor

I have this confusion related to curse of dimensionality in k nearest neighbor search.enter image description here

It says that as the number of dimensions are higher I need to cover more space to get the same number of training examples. I didn't get it what is it trying to show and how does it occur. Any clarifications?


For the $k$ nearest neighbor rule to perform well, we want the neighbours to be representative of the population densities at the query point (the given value $x$ to be classified). Which is to say that the $k$ nearest neighbours should typically fall near that point $x$. We can expect that to happen in low dimensions: if a unit interval, for example, with 5000 points the 5 nearest neighbours would be in a neighborhood of length $0.001$, in average, which seems right; the 5000 points will cover decently our space, even when taken in groups of 5: we can expect that the 5 neighbours will be quite near $x$. But, say, in 6 dimensions, we cannot be so optimistic: 5000 points in this cube means that we have in average 5 points for each 6-cube of size length=0.31, so the 5 nearest neighbours for a given query point will not be, in average, very near to it.


If you are working in $d$ dimensions, then the volume of the unit cube is $1$ and the volume of a sub-cube of side-length $c$ is $c^d$. Thus, if you want $c^d = 1/1000$ so that you capture $5$ nearest neighbors out of $5000$, then on average assuming uniformly distributed data you will need $c = (1/1000)^{1/d}$ to achieve $5$ nearest neighbors in a cube of side-length $c$. As $d$ increases, $c \to 1$ so your 5-nearest neighbors are nowhere close to your query point. One way to address this problem is to apply PCA-projection of your data points onto lower dimensions (if your points have some few principal modes of global variation), or apply mainifold learning where the distance between points becomes the distance between them on the learned manifold.


Lets say we have a p-dimensional unit cube representing our data. (where each dimension/feature corresponds to an edge of the cube).

Lets say we try to use the K-nearest neighbor classifier to predict the output for test data based on the output values of inputs that are close to the test input. So let us explore a fraction 'r' of the unit volume of the p-dimensional cube. So the volume of the cube that we are trying to explore is 1/r. So the length of the edge of cube is [math]r^{1/p}[math].

So say we have 100 features in our dataset (p=100 and )if we want to explore 1%(0.01) of the space of cube while doing nearest neighbour, the length of the edge of the cube is [math]0.1^{1/100}[/math] = 0.977 or we need to explore 97% of each feature to get information about 1% of the whole data. This is such a huge information to explore and takes a lot of time and space complexity. This is what we mean intuitively by curse of dimensionality, where the actual data to explore is bigger than what we think we need to explore.