In TensorFlow, what is tf.identity used for?
After some stumbling I think I've noticed a single use case that fits all the examples I've seen. If there are other use cases, please elaborate with an example.
Use case:
Suppose you'd like to run an operator every time a particular Variable is evaluated. For example, say you'd like to add one to x
every time the variable y
is evaluated. It might seem like this will work:
x = tf.Variable(0.0)
x_plus_1 = tf.assign_add(x, 1)
with tf.control_dependencies([x_plus_1]):
y = x
init = tf.initialize_all_variables()
with tf.Session() as session:
init.run()
for i in xrange(5):
print(y.eval())
It doesn't: it'll print 0, 0, 0, 0, 0. Instead, it seems that we need to add a new node to the graph within the control_dependencies
block. So we use this trick:
x = tf.Variable(0.0)
x_plus_1 = tf.assign_add(x, 1)
with tf.control_dependencies([x_plus_1]):
y = tf.identity(x)
init = tf.initialize_all_variables()
with tf.Session() as session:
init.run()
for i in xrange(5):
print(y.eval())
This works: it prints 1, 2, 3, 4, 5.
If in the CIFAR-10 tutorial we dropped tf.identity
, then loss_averages_op
would never run.
tf.identity
is useful when you want to explicitly transport tensor between devices (like, from GPU to a CPU).
The op adds send/recv nodes to the graph, which make a copy when the devices of the input and the output are different.
A default behavior is that the send/recv nodes are added implicitly when the operation happens on a different device but you can imagine some situations (especially in a multi-threaded/distributed settings) when it might be useful to fetch the value of the variable multiple times within a single execution of the session.run
. tf.identity
allows for more control with regard to when the value should be read from the source device. Possibly a more appropriate name for this op would be read
.
Also, please note that in the implementation of tf.Variable
link, the identity op is added in the constructor, which makes sure that all the accesses to the variable copy the data from the source only once. Multiple copies can be expensive in cases when the variable lives on a GPU but it is read by multiple CPU ops (or the other way around). Users can change the behavior with multiple calls to tf.identity
when desired.
EDIT: Updated answer after the question was edited.
In addition, tf.identity
can be used used as a dummy node to update a reference to the tensor. This is useful with various control flow ops. In the CIFAR case we want to enforce that the ExponentialMovingAverageOp will update relevant variables before retrieving the value of the loss. This can be implemented as:
with tf.control_dependencies([loss_averages_op]):
total_loss = tf.identity(total_loss)
Here, the tf.identity
doesn't do anything useful aside of marking the total_loss
tensor to be ran after evaluating loss_averages_op
.
In addition to the above, I simply use it when I need to assign a name to ops that do not have a name argument, just like when initializing a state in RNN's:
rnn_cell = tf.contrib.rnn.MultiRNNCell([cells])
# no name arg
initial_state = rnn_cell.zero_state(batch_size,tf.float32)
# give it a name with tf.identity()
initial_state = tf.identity(input=initial_state,name="initial_state")
I came across another use case that is not completely covered by the other answers.
def conv_layer(input_tensor, kernel_shape, output_dim, layer_name, decay=None, act=tf.nn.relu):
"""Reusable code for making a simple convolutional layer.
"""
# Adding a name scope ensures logical grouping of the layers in the graph.
with tf.name_scope(layer_name):
# This Variable will hold the state of the weights for the layer
with tf.name_scope('weights'):
weights = weight_variable(kernel_shape, decay)
variable_summaries(weights, layer_name + '/weights')
with tf.name_scope('biases'):
biases = bias_variable([output_dim])
variable_summaries(biases, layer_name + '/biases')
with tf.name_scope('convolution'):
preactivate = tf.nn.conv2d(input_tensor, weights, strides=[1, 1, 1, 1], padding='SAME')
biased = tf.nn.bias_add(preactivate, biases)
tf.histogram_summary(layer_name + '/pre_activations', biased)
activations = act(biased, 'activation')
tf.histogram_summary(layer_name + '/activations', activations)
return activations
Most of the time when constructing a convolutional layer, you just want the activations returned so you can feed those into the next layer. Sometimes, however - for example when building an auto-encoder - you want the pre-activation values.
In this situation an elegant solution is to pass tf.identity
as the activation function, effectively not activating the layer.