How to do Xavier initialization on TensorFlow
I'm porting my Caffe network over to TensorFlow but it doesn't seem to have xavier initialization. I'm using truncated_normal
but this seems to be making it a lot harder to train.
Solution 1:
Since version 0.8 there is a Xavier initializer, see here for the docs.
You can use something like this:
W = tf.get_variable("W", shape=[784, 256],
initializer=tf.contrib.layers.xavier_initializer())
Solution 2:
Just to add another example on how to define a tf.Variable
initialized using Xavier and Yoshua's method:
graph = tf.Graph()
with graph.as_default():
...
initializer = tf.contrib.layers.xavier_initializer()
w1 = tf.Variable(initializer(w1_shape))
b1 = tf.Variable(initializer(b1_shape))
...
This prevented me from having nan
values on my loss function due to numerical instabilities when using multiple layers with RELUs.