Checkpointing keras model: TypeError: can't pickle _thread.lock objects

It seems like the error has occurred in the past in different contexts here, but I'm not dumping the model directly -- I'm using the ModelCheckpoint callback. Any idea what could be going wrong?

Information:

  • Keras version 2.0.8
  • Tensorflow version 1.3.0
  • Python 3.6

Minimal example to reproduce the error:

from keras.layers import Input, Lambda, Dense
from keras.models import Model
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam
import tensorflow as tf
import numpy as np

x = Input(shape=(30,3))
low = tf.constant(np.random.rand(30, 3).astype('float32'))
high = tf.constant(1 + np.random.rand(30, 3).astype('float32'))
clipped_out_position = Lambda(lambda x, low, high: tf.clip_by_value(x, low, high),
                                      arguments={'low': low, 'high': high})(x)

model = Model(inputs=x, outputs=[clipped_out_position])
optimizer = Adam(lr=.1)
model.compile(optimizer=optimizer, loss="mean_squared_error")
checkpoint = ModelCheckpoint("debug.hdf", monitor="val_loss", verbose=1, save_best_only=True, mode="min")
training_callbacks = [checkpoint]
model.fit(np.random.rand(100, 30, 3), [np.random.rand(100, 30, 3)], callbacks=training_callbacks, epochs=50, batch_size=10, validation_split=0.33)

Error output:

Train on 67 samples, validate on 33 samples
Epoch 1/50
10/67 [===>..........................] - ETA: 0s - loss: 0.1627Epoch 00001: val_loss improved from inf to 0.17002, saving model to debug.hdf
Traceback (most recent call last):
  File "debug_multitask_inverter.py", line 19, in <module>
    model.fit(np.random.rand(100, 30, 3), [np.random.rand(100, 30, 3)], callbacks=training_callbacks, epochs=50, batch_size=10, validation_split=0.33)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/site-packages/keras/engine/training.py", line 1631, in fit

▽
    validation_steps=validation_steps)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/site-packages/keras/engine/training.py", line 1233, in _fit_loop
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/site-packages/keras/callbacks.py", line 73, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/site-packages/keras/callbacks.py", line 414, in on_epoch_end
    self.model.save(filepath, overwrite=True)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/site-packages/keras/engine/topology.py", line 2556, in save
    save_model(self, filepath, overwrite, include_optimizer)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/site-packages/keras/models.py", line 107, in save_model
    'config': model.get_config()
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/site-packages/keras/engine/topology.py", line 2397, in get_config
    return copy.deepcopy(config)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 215, in _deepcopy_list
    append(deepcopy(a, memo))
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/om/user/lnj/openmind_env/tensorflow-gpu/lib/python3.6/copy.py", line 169, in deepcopy
    rv = reductor(4)
TypeError: can't pickle _thread.lock objects

Solution 1:

When saving a Lambda layer, the arguments passed in will also be saved. In this case, it contains two tf.Tensors. It seems that Keras does not support serializing tf.Tensor in the model config right now.

However, numpy arrays can be serialized without problem. So instead of passing tf.Tensor in arguments, you can pass in numpy arrays, and convert them into tf.Tensors in the lambda function.

x = Input(shape=(30,3))
low = np.random.rand(30, 3)
high = 1 + np.random.rand(30, 3)
clipped_out_position = Lambda(lambda x, low, high: tf.clip_by_value(x, tf.constant(low, dtype='float32'), tf.constant(high, dtype='float32')),
                              arguments={'low': low, 'high': high})(x)

A problem with the lines above is that, when trying to load this model, you might see a NameError: name 'tf' is not defined. That's because TensorFlow is not imported in the file where the Lambda layer is reconstructed (core.py).

Changing tf into K.tf can fix the problem. Also you can replace tf.constant() by K.constant(), which casts low and high into float32 tensors automatically.

from keras import backend as K
x = Input(shape=(30,3))
low = np.random.rand(30, 3)
high = 1 + np.random.rand(30, 3)
clipped_out_position = Lambda(lambda x, low, high: K.tf.clip_by_value(x, K.constant(low), K.constant(high)),
                              arguments={'low': low, 'high': high})(x)

Solution 2:

To clarify: this is not a problem of Keras being unable to pickle a Tensor (other scenarios possible, see below) in a Lambda layer, but rather that the arguments of the python's function (here: a lambda function) are attempted to be serialized independently from the function (here: outside of the context of the lambda function itself). This works for 'static' arguments, but fails otherwise. In order to circumvent it, one should wrap the non-static function arguments in another function.

Here are a couple of workarounds:


  1. Use static variables, such as python/numpy-variables (just a mentioned above):
low = np.random.rand(30, 3)
high = 1 + np.random.rand(30, 3)

x = Input(shape=(30,3))
clipped_out_position = Lambda(lambda x: tf.clip_by_value(x, low, high))(x)

  1. Use functools.partial to wrap your lambda-function:
import functools

clip_by_value = functools.partial(
   tf.clip_by_value,
   clip_value_min=low,
   clip_value_max=high)

x = Input(shape=(30,3))
clipped_out_position = Lambda(lambda x: clip_by_value(x))(x)

  1. Use a closure to wrap your lambda-function:
low = tf.constant(np.random.rand(30, 3).astype('float32'))
high = tf.constant(1 + np.random.rand(30, 3).astype('float32'))

def clip_by_value(t):
    return tf.clip_by_value(t, low, high)

x = Input(shape=(30,3))
clipped_out_position = Lambda(lambda x: clip_by_value(x))(x)

Notice: although that you can sometimes drop the creation of explicit lambda-function and have this cleaner code snippet instead:

clipped_out_position = Lambda(clip_by_value)(x)

the absence of an extra wrapping layer of a lambda function (that is lambda t: clip_by_value(t)) might still lead to the same problem when doing 'deep-copy' of the function arguments, and should be avoided.


  1. Finally, you can wrap your model logic into a separate Keras layer, which in this particular case may look a bit over-engineered:
x = Input(shape=(30,3))
low = Lambda(lambda t: tf.constant(np.random.rand(30, 3).astype('float32')))(x)
high = Lambda(lambda t: tf.constant(1 + np.random.rand(30, 3).astype('float32')))(x)
clipped_out_position = Lambda(lambda x: tf.clip_by_value(*x))((x, low, high))

Notice: the tf.clip_by_value(*x) in the last Lambda layer is just an unpacked argument tuple, which can also be written in a more verbose form as tf.clip_by_value(x[0], x[1], x[2]) instead.


(below) As a side note: such a scenario, where your lambda-function is trying to capture (a part of) a class instance will also break the serialization (due to a late binding):

class MyModel:
    def __init__(self):
        self.low = np.random.rand(30, 3)
        self.high = 1 + np.random.rand(30, 3)

    def run(self):
        x = Input(shape=(30,3))
        clipped_out_position = Lambda(lambda x: tf.clip_by_value(x, self.low, self.high))(x)
        model = Model(inputs=x, outputs=[clipped_out_position])
        optimizer = Adam(lr=.1)
        model.compile(optimizer=optimizer, loss="mean_squared_error")
        checkpoint = ModelCheckpoint("debug.hdf", monitor="val_loss", verbose=1, save_best_only=True, mode="min")
        training_callbacks = [checkpoint]
        model.fit(np.random.rand(100, 30, 3), 
                 [np.random.rand(100, 30, 3)], callbacks=training_callbacks, epochs=50, batch_size=10, validation_split=0.33)

MyModel().run()

Which can be solved by assuring an early binding by this default argument trick:

        (...)
        clipped_out_position = Lambda(lambda x, l=self.low, h=self.high: tf.clip_by_value(x, l, h))(x)
        (...)

Solution 3:

See my post at https://github.com/keras-team/keras/issues/8343#issuecomment-470376703.

This exception is raised because you're trying to serialize a tf.tensor, so any methods that will avoid this serialization would work. including:

  • Not serialize it: save model weights only, because this serialization happens when you're trying to save the model structure with json/yaml string.

  • Eliminate tensorflow tensors: make sure every tensor in your model is produced by a keras layer. Or use ndarray data instead if possible, just like what @Yu-Yang suggested.