Steepest descent and finding optimal step size

Solution 1:

Here's a notional Armijo–Goldstein implementation. Can't test it without a data+function example, though.

# both should be less than, but usually close to 1
c = 0.8  # how much imperfection in function improvement we'll settle up with
tau = 0.8  # how much the step will be decreased at each iteration

x = np.array(f.optimal_range())  # assume everything is a vector; x is an n-dimensional coordinate

# NOTE: the part below is repeated for every X update
step = 0.3  # alpha in Armijo–Goldstein terms    
gradient = np.array(f.fprime_x(x[0]), f.fprime_y(x[1]), ...)
# in the simplest case (SGD) p can point in the direction of gradient,
# but in general they don't have to be the same, e.g. because of added momentum
p = -gradient / ((gradient**2).sum() **0.5)
m = gradient.dot(p)  # "happy case" improvement per unit step
t = - c * m  # improvement we'll consider good enough
# func(*x) might be worth precomputing
while func(*x) - func(*(x + step*p)) < step *  t:  # good enough step size found
    step *= tau

# update X and repeat