How to do Xavier initialization on TensorFlow

I'm porting my Caffe network over to TensorFlow but it doesn't seem to have xavier initialization. I'm using truncated_normal but this seems to be making it a lot harder to train.


Solution 1:

Since version 0.8 there is a Xavier initializer, see here for the docs.

You can use something like this:

W = tf.get_variable("W", shape=[784, 256],
           initializer=tf.contrib.layers.xavier_initializer())

Solution 2:

Just to add another example on how to define a tf.Variable initialized using Xavier and Yoshua's method:

graph = tf.Graph()
with graph.as_default():
    ...
    initializer = tf.contrib.layers.xavier_initializer()
    w1 = tf.Variable(initializer(w1_shape))
    b1 = tf.Variable(initializer(b1_shape))
    ...

This prevented me from having nan values on my loss function due to numerical instabilities when using multiple layers with RELUs.