"freeze" some variables/scopes in tensorflow: stop_gradient vs passing variables to minimize

I am trying to implement Adversarial NN, which requires to 'freeze' one or the other part of the graph during alternating training minibatches. I.e. there two sub-networks: G and D.

G( Z ) ->  Xz
D( X ) ->  Y

where loss function of G depends on D[G(Z)], D[X].

First I need to train parameters in D with all G parameters fixed, and then parameters in G with parameters in D fixed. Loss function in first case will be negative loss function in the second case and the update will have to apply to the parameters of whether first or second subnetwork.

I saw that tensorflow has tf.stop_gradient function. For purpose of training the D (downstream) subnetwork I can use this function to block the gradient flow to

 Z -> [ G ] -> tf.stop_gradient(Xz) -> [ D ] -> Y

The tf.stop_gradient is very succinctly annotated with no in-line example (and example seq2seq.py is too long and not that easy to read), but looks like it must be called during the graph creation. Does it imply that if I want to block/unblock gradient flow in alternating batches, I need to re-create and re-initialize the graph model?

Also it seems that one cannot block the gradient flowing through the G (upstream) network by means of tf.stop_gradient, right?

As an alternative I saw that one can pass the list of variables to the optimizer call as opt_op = opt.minimize(cost, <list of variables>), which would be an easy solution if one could get all variables in the scopes of each subnetwork. Can one get a <list of variables> for a tf.scope?


Solution 1:

The easiest way to achieve this, as you mention in your question, is to create two optimizer operations using separate calls to opt.minimize(cost, ...). By default, the optimizer will use all of the variables in tf.trainable_variables(). If you want to filter the variables to a particular scope, you can use the optional scope argument to tf.get_collection() as follows:

optimizer = tf.train.AdagradOptimzer(0.01)

first_train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
                                     "scope/prefix/for/first/vars")
first_train_op = optimizer.minimize(cost, var_list=first_train_vars)

second_train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
                                      "scope/prefix/for/second/vars")                     
second_train_op = optimizer.minimize(cost, var_list=second_train_vars)

Solution 2:

@mrry's answer is completely right and perhaps more general than what I'm about to suggest. But I think a simpler way to accomplish it is to just pass the python reference directly to var_list:

W = tf.Variable(...)
C = tf.Variable(...)
Y_est = tf.matmul(W,C)
loss = tf.reduce_sum((data-Y_est)**2)
optimizer = tf.train.AdamOptimizer(0.001)

# You can pass the python object directly
train_W = optimizer.minimize(loss, var_list=[W])
train_C = optimizer.minimize(loss, var_list=[C])

I have a self-contained example here: https://gist.github.com/ahwillia/8cedc710352eb919b684d8848bc2df3a

Solution 3:

Another option you might want to consider is you can set trainable=False on a variable. Which means it will not be modified by training.

tf.Variable(my_weights, trainable=False)

Solution 4:

I don't know if my approach has down sides, but I solved this issue for myself with this construct:

do_gradient = <Tensor that evaluates to 0 or 1>
no_gradient = 1 - do_gradient
wrapped_op = do_gradient * original + no_gradient * tf.stop_gradient(original)

So if do_gradient = 1, the values and gradients will flow through just fine, but if do_gradient = 0, then the values will only flow through the stop_gradient op, which will stop the gradients flowing back.

For my scenario, hooking do_gradient up to an index of a random_shuffle tensor let me randomly train different pieces of my network.