Reproducible results using Keras with TensorFlow backend

Solution 1:

As @Poete_Maudit said here: How to get reproducible results in keras

To get reproducible results you will have to do the following at the very beginning of your script (that will be forced to use a single CPU):

# Seed value (can actually be different for each attribution step)
seed_value= 0

# 1. Set `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)

# 2. Set `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)

# 3. Set `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)

# 4. Set `tensorflow` pseudo-random generator at a fixed value
import tensorflow as tf
tf.random.set_seed(seed_value) # tensorflow 2.x
# tf.set_random_seed(seed_value) # tensorflow 1.x

# 5. Configure a new global `tensorflow` session
from keras import backend as K
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)

Note: You cannot (anymore) get reproducible results using command: PYTHONHASHSEED=0 python3 script.py, as https://keras.io/getting-started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development might let you think, and you have to set PYTHONHASHSEED with os.environ within your script as in step #1. Also, this does NOT work for GPU usage.

Solution 2:

There is an inherent randomness associated with deep learning leading to non reproducible results, But you can control it up to certain extent.

Since we are using Deep neural network, we can have different randomness affecting our reproducibility leading to different results such as

  • Randomness in Initialization, such as weights.

  • Randomness in Regularization, such as dropout.

  • Randomness in Layers.

  • Randomness in Optimization.

But there are several ways to mitigate this one option is to use summary statistics. Another method that will provide more reproducible result is to use a random seed with numpy and/or tensorflow, see:

https://docs.scipy.org/doc/numpy-1.12.0/reference/generated/numpy.random.seed.html

https://www.tensorflow.org/api_docs/python/tf/set_random_seed

For the methods that are using GPUs we could specify it to use a deterministic method instead of the default non-deterministic method.For nvidia graphic cards see: docs.nvidia.com/cuda