Is there a way to stack two tensorflow datasets?

Solution 1:

The tf.data.Dataset.concatenate() method is the closest analog of tf.stack() when working with datasets. If you have two datasets with the same structure (i.e. same types for each component, but possibly different shapes):

dataset_1 = tf.data.Dataset.range(10, 20)
dataset_2 = tf.data.Dataset.range(60, 70)

then you can concatenate them as follows:

combined_dataset = dataset_1.concatenate(dataset_2)

Solution 2:

If by stacking you mean what tf.stack() and np.stack() do:

Stacks a list of rank-R tensors into one rank-(R+1) tensor.

https://www.tensorflow.org/api_docs/python/tf/stack

Join a sequence of arrays along a new axis.

https://docs.scipy.org/doc/numpy/reference/generated/numpy.stack.html

then I believe the closest you can come with a tf.data.Dataset is Dataset.zip():

@staticmethod
zip(datasets)

Creates a Dataset by zipping together the given datasets.

https://www.tensorflow.org/api_docs/python/tf/data/Dataset?version=stable#zip

This allows you to iterate through multiple datasets at the same time by iterating over the shared dimension of the original datasets, similarly to a stack()ed tensor or matrix.

You can then also use .map(tf.stack) or .map(lambda *t: tf.stack(t, axis=-1)) to stack the tensors along new dimensions at the front or back, respectively,

If indeed you want to achieve what tf.concat() and np.concatenate() do, then you use Dataset.concatenate().