Tensorflow One Hot Encoder?
As of TensorFlow 0.8, there is now a native one-hot op, tf.one_hot
that can convert a set of sparse labels to a dense one-hot representation. This is in addition to tf.nn.sparse_softmax_cross_entropy_with_logits
, which can in some cases let you compute the cross entropy directly on the sparse labels instead of converting them to one-hot.
Previous answer, in case you want to do it the old way: @Salvador's answer is correct - there (used to be) no native op to do it. Instead of doing it in numpy, though, you can do it natively in tensorflow using the sparse-to-dense operators:
num_labels = 10
# label_batch is a tensor of numeric labels to process
# 0 <= label < num_labels
sparse_labels = tf.reshape(label_batch, [-1, 1])
derived_size = tf.shape(label_batch)[0]
indices = tf.reshape(tf.range(0, derived_size, 1), [-1, 1])
concated = tf.concat(1, [indices, sparse_labels])
outshape = tf.pack([derived_size, num_labels])
labels = tf.sparse_to_dense(concated, outshape, 1.0, 0.0)
The output, labels, is a one-hot matrix of batch_size x num_labels.
Note also that as of 2016-02-12 (which I assume will eventually be part of a 0.7 release), TensorFlow also has the tf.nn.sparse_softmax_cross_entropy_with_logits
op, which in some cases can let you do training without needing to convert to a one-hot encoding.
Edited to add: At the end, you may need to explicitly set the shape of labels. The shape inference doesn't recognize the size of the num_labels component. If you don't need a dynamic batch size with derived_size, this can be simplified.
Edited 2016-02-12 to change the assignment of outshape per comment below.
tf.one_hot()
is available in TF and easy to use.
Lets assume you have 4 possible categories (cat, dog, bird, human) and 2 instances (cat, human). So your depth=4
and your indices=[0, 3]
import tensorflow as tf
res = tf.one_hot(indices=[0, 3], depth=4)
with tf.Session() as sess:
print sess.run(res)
Keep in mind that if you provide index=-1 you will get all zeros in your one-hot vector.
Old answer, when this function was not available.
After looking though the python documentation, I have not found anything similar. One thing that strengthen my belief that it does not exist is that in their own example they write one_hot
manually.
def dense_to_one_hot(labels_dense, num_classes=10):
"""Convert class labels from scalars to one-hot vectors."""
num_labels = labels_dense.shape[0]
index_offset = numpy.arange(num_labels) * num_classes
labels_one_hot = numpy.zeros((num_labels, num_classes))
labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1
return labels_one_hot
You can also do this in scikitlearn.
numpy
does it!
import numpy as np
np.eye(n_labels)[target_vector]
A simple and short way to one-hot encode any integer or list of intergers:
a = 5
b = [1, 2, 3]
# one hot an integer
one_hot_a = tf.nn.embedding_lookup(np.identity(10), a)
# one hot a list of integers
one_hot_b = tf.nn.embedding_lookup(np.identity(max(b)+1), b)