Characterizing (stationary) points by the number of valleys one can descent into

In non-convex optimizing of more than 2 times differentiable $f: \mathbb{R}^2 \mapsto \mathbb{R}$ we can encounter saddle points that have multiple valley one could descent into.

x^2-y^2 plotted for x,y in (-2,2)

At $(0,0)$ there are two direction of descent: (0,-1) and (0,1). However the number of valley one could of descent can be larger than the number of dimensions as the the monkey saddle illustrates:

I want to distinguish descending into different valleys and from the number of directions as descent. If we look at a Unit Box of the Rastrigin function we see that the local maxima at (?.5,?.5) can descent in any direction but the axis aligned directions lead into saddle points, if we ignores those we have non connected open sets corresponding to each attractor of a "not getting stuck in saddle points" optimizers (Newton method, sufficiently perturbed gradient descent, ...). enter image description here

What is the name for the concept that would assign $(0,0)$ of the normal saddle a 2, the monkey saddle at (0,0) a 3 and any point (?.5, ?.5) a 4 on the Rastrigin function?

Context:

I am motivated to investigate this property over non stationary points. However i need a name for it first to see if there is existing literature.

Any point along the x-axis of normal saddle point could be assigned a two, any point of the (strictly) positive x-axis of the monkey saddle could be assigned a 2 too. Any point of with only one component ending in .5 on the Rastrigin function would be a 2 too.

I suspect that for an $f: \mathbb{R}^n \mapsto \mathbb{R}$ there exist a $\mathbb{R}^{n-1}$ dimensional (non connected) pseudo(?) manifold the contains all points that have more than two valleys next to them. All points which have exactly two neighboring valleys are part of n-dimensional hyper surfaces which divide different basins of attractions of local optima. Such surfaces might meet with other surfaces to from "edges" connecting the surfaces.

I hope that finding that partioning structure can be used to accelerate global optimization of non-convex functions. A related but more Greedy (descents and forks but doesn't ascend) approach called "Ridge Rider" is already known in the literature and used to find more diverse solutions to optimization problems.

What you are describing sounds quite similar to Morse homology. Many treatments of this topic are quite abstract, but the basic ideas are simple to describe. Suppose that we have a function $f: \mathbb{R}^2\to\mathbb{R}$. We will consider the critical points of this function and classify them by their index. By definition, the index counts the number of negative eigenvalues of the Hessian of $f$ at the critical point. Thus local maxima have index 2, saddles ponts have index 1, and local minima have index 0 (ofc in higher dimensions, there are more possible values for the index). One defines vector spaces $C_2,C_1,C_0$, where $C_i$ is generated by the critical points of index $i$. Then there are linear maps called boundary operators $\partial_1: C_2\to C_1$ and $\partial_0:C_1\to C_0$, where by definition $\partial_1(p)=\sum_q \pm N(p,q)q$ where $p$ is a local maximum, $q$ is a saddle point, and $N(p,q)$ is the number of gradient flowlines from $p$ to $q$ (the other operator is defined similarly, but sounds less relevant to your question). In other words, the boundary map contains all information about which saddle points can be reached from which maxima along gradient flow lines, and the total number of saddle points that can be reached from a given point is the sum of absolute values of the corresponding row of matrix. In general there is a large mathematical literature on this topic, but I am not sure how much these ideas have spread into non-convex optimization.

Characterizing (stationary) points by the number of valleys one can descent into

Related

Recent Posts