"freeze" some variables/scopes in tensorflow: stop_gradient vs passing variables to minimize
I am trying to implement Adversarial NN, which requires to 'freeze' one or the other part of the graph during alternating training minibatches. I.e. there two sub-networks: G and D.
G( Z ) -> Xz
D( X ) -> Y
where loss function of G
depends on D[G(Z)], D[X]
.
First I need to train parameters in D with all G parameters fixed, and then parameters in G with parameters in D fixed. Loss function in first case will be negative loss function in the second case and the update will have to apply to the parameters of whether first or second subnetwork.
I saw that tensorflow has tf.stop_gradient
function. For purpose of training the D (downstream) subnetwork I can use this function to block the gradient flow to
Z -> [ G ] -> tf.stop_gradient(Xz) -> [ D ] -> Y
The tf.stop_gradient
is very succinctly annotated with no in-line example (and example seq2seq.py
is too long and not that easy to read), but looks like it must be called during the graph creation. Does it imply that if I want to block/unblock gradient flow in alternating batches, I need to re-create and re-initialize the graph model?
Also it seems that one cannot block the gradient flowing through the G (upstream) network by means of tf.stop_gradient
, right?
As an alternative I saw that one can pass the list of variables to the optimizer call as opt_op = opt.minimize(cost, <list of variables>)
, which would be an easy solution if one could get all variables in the scopes of each subnetwork. Can one get a <list of variables>
for a tf.scope?
Solution 1:
The easiest way to achieve this, as you mention in your question, is to create two optimizer operations using separate calls to opt.minimize(cost, ...)
. By default, the optimizer will use all of the variables in tf.trainable_variables()
. If you want to filter the variables to a particular scope, you can use the optional scope
argument to tf.get_collection()
as follows:
optimizer = tf.train.AdagradOptimzer(0.01)
first_train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
"scope/prefix/for/first/vars")
first_train_op = optimizer.minimize(cost, var_list=first_train_vars)
second_train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
"scope/prefix/for/second/vars")
second_train_op = optimizer.minimize(cost, var_list=second_train_vars)
Solution 2:
@mrry's answer is completely right and perhaps more general than what I'm about to suggest. But I think a simpler way to accomplish it is to just pass the python reference directly to var_list
:
W = tf.Variable(...)
C = tf.Variable(...)
Y_est = tf.matmul(W,C)
loss = tf.reduce_sum((data-Y_est)**2)
optimizer = tf.train.AdamOptimizer(0.001)
# You can pass the python object directly
train_W = optimizer.minimize(loss, var_list=[W])
train_C = optimizer.minimize(loss, var_list=[C])
I have a self-contained example here: https://gist.github.com/ahwillia/8cedc710352eb919b684d8848bc2df3a
Solution 3:
Another option you might want to consider is you can set trainable=False on a variable. Which means it will not be modified by training.
tf.Variable(my_weights, trainable=False)
Solution 4:
I don't know if my approach has down sides, but I solved this issue for myself with this construct:
do_gradient = <Tensor that evaluates to 0 or 1>
no_gradient = 1 - do_gradient
wrapped_op = do_gradient * original + no_gradient * tf.stop_gradient(original)
So if do_gradient = 1
, the values and gradients will flow through just fine, but if do_gradient = 0
, then the values will only flow through the stop_gradient op, which will stop the gradients flowing back.
For my scenario, hooking do_gradient up to an index of a random_shuffle tensor let me randomly train different pieces of my network.