How to set adaptive learning rate for GradientDescentOptimizer?
I am using TensorFlow to train a neural network. This is how I am initializing the GradientDescentOptimizer
:
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
mse = tf.reduce_mean(tf.square(out - out_))
train_step = tf.train.GradientDescentOptimizer(0.3).minimize(mse)
The thing here is that I don't know how to set an update rule for the learning rate or a decay value for that.
How can I use an adaptive learning rate here?
First of all, tf.train.GradientDescentOptimizer
is designed to use a constant learning rate for all variables in all steps. TensorFlow also provides out-of-the-box adaptive optimizers including the tf.train.AdagradOptimizer
and the tf.train.AdamOptimizer
, and these can be used as drop-in replacements.
However, if you want to control the learning rate with otherwise-vanilla gradient descent, you can take advantage of the fact that the learning_rate
argument to the tf.train.GradientDescentOptimizer
constructor can be a Tensor
object. This allows you to compute a different value for the learning rate in each step, for example:
learning_rate = tf.placeholder(tf.float32, shape=[])
# ...
train_step = tf.train.GradientDescentOptimizer(
learning_rate=learning_rate).minimize(mse)
sess = tf.Session()
# Feed different values for learning rate to each training step.
sess.run(train_step, feed_dict={learning_rate: 0.1})
sess.run(train_step, feed_dict={learning_rate: 0.1})
sess.run(train_step, feed_dict={learning_rate: 0.01})
sess.run(train_step, feed_dict={learning_rate: 0.01})
Alternatively, you could create a scalar tf.Variable
that holds the learning rate, and assign it each time you want to change the learning rate.
Tensorflow provides an op to automatically apply an exponential decay to a learning rate tensor: tf.train.exponential_decay
. For an example of it in use, see this line in the MNIST convolutional model example. Then use @mrry's suggestion above to supply this variable as the learning_rate parameter to your optimizer of choice.
The key excerpt to look at is:
# Optimizer: set up a variable that's incremented once per batch and
# controls the learning rate decay.
batch = tf.Variable(0)
learning_rate = tf.train.exponential_decay(
0.01, # Base learning rate.
batch * BATCH_SIZE, # Current index into the dataset.
train_size, # Decay step.
0.95, # Decay rate.
staircase=True)
# Use simple momentum for the optimization.
optimizer = tf.train.MomentumOptimizer(learning_rate,
0.9).minimize(loss,
global_step=batch)
Note the global_step=batch
parameter to minimize. That tells the optimizer to helpfully increment the 'batch' parameter for you every time it trains.