how to build TensorFlow input pipelines for images and their coresponding label
Your code seems to be working fine. I thought initially that your code was not working correctly because you were passing train_labels
as a pandas series to from_tensor_slices
, but that does not seem to be a problem. I can only imagine that the buffer_size
in dataset.shuffle
is too small. For example, if I set the buffer_size
to 1, I get the same samples every time I call dataset.take(1)
, because according to the docs:
[...] if your dataset contains 10,000 elements but buffer_size is set to 1,000, then shuffle will initially select a random element from only the first 1,000 elements in the buffer [...]
Maybe your first 100 elements have the label 0? Again, it's just a suggestion. I have managed to get your code to retrieve different labels each time by using a large buffer_size
:
import pandas as pd
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
classes = ['Danger', 'Give Way', 'Hump']
label_map = {v:i for i, v in enumerate(classes)}
d = {'label': ['Danger', 'Give Way', 'Hump', 'Danger', 'Give Way', 'Hump', 'Danger', 'Give Way', 'Hump'],
'other': [1, 2, 3, 4, 5, 6, 7, 8, 9]}
df = pd.DataFrame(data=d)
train_labels = df['label'].map(label_map)
def load_data(image, label):
image /= 255.0
return image, label
features = tf.random.normal((9, 32, 32, 3))
dataset = tf.data.Dataset.from_tensor_slices((features, train_labels))
dataset = dataset.shuffle(buffer_size=9)
dataset = dataset.map(load_data, num_parallel_calls=tf.data.experimental.AUTOTUNE)
dataset = dataset.batch(batch_size=2)
dataset = dataset.repeat()
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
plt.figure(figsize=(5,5))
for i in range(2):
for val in dataset.take(1):
img = val[0][i]*255.0
plt.subplot(1,2,i+1)
plt.imshow(tf.cast(img,tf.uint8))
plt.title(val[1][i].numpy())
plt.subplots_adjust(hspace=1)
plt.show()