How to inspect a Tensorflow .tfrecord file?

I have a .tfrecord but I don't know how it is structured. How can I inspect the schema to understand what the .tfrecord file contains?

All Stackoverflow answers or documentation seem to assume I know the structure of the file.

reader = tf.TFRecordReader()
file = tf.train.string_input_producer("record.tfrecord")
_, serialized_record = reader.read(file)

...HOW TO INSPECT serialized_record...

Solution 1:

Found it!

import tensorflow as tf

for example in tf.python_io.tf_record_iterator("data/foobar.tfrecord"):
    print(tf.train.Example.FromString(example))

You can also add:

from google.protobuf.json_format import MessageToJson
...
jsonMessage = MessageToJson(tf.train.Example.FromString(example))

Solution 2:

Above solutions didn't work for me so for TF 2.0 use this:

import tensorflow as tf 
raw_dataset = tf.data.TFRecordDataset("path-to-file")

for raw_record in raw_dataset.take(1):
    example = tf.train.Example()
    example.ParseFromString(raw_record.numpy())
    print(example)

https://www.tensorflow.org/tutorials/load_data/tfrecord#reading_a_tfrecord_file_2

Solution 3:

If your .tftrecord contains SequenceExample, the accepted answer won't show you everything. You can use:

import tensorflow as tf

for example in tf.python_io.tf_record_iterator("data/foobar.tfrecord"):
    result = tf.train.SequenceExample.FromString(example)
    break
print(result)

This will show you the content of the first example.

Then you can also inspect individual Features using their keys:

result.context.feature["foo_key"]

And for FeatureLists:

result.feature_lists.feature_list["bar_key"]

Solution 4:

Use TensorFlow tf.TFRecordReader with the tf.parse_single_example decoder as specified in https://www.tensorflow.org/programmers_guide/reading_data

PS, tfrecord contains 'Example' records defined in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/example.proto

Once you extract the record into a string, parsing it is something like this

a=tf.train.Example()
result = a.ParseFromString(binary_string_with_example_record)

However, I'm not sure where's the raw support for extracting individual records from a file, you can track it down in TFRecordReader

Solution 5:

If it's an option to install another Python package, tfrecord_lite is very convenient.

Example:

In [1]: import tensorflow as tf
   ...: from tfrecord_lite import decode_example
   ...:
   ...: it = tf.python_io.tf_record_iterator('nsynth-test.tfrecord')
   ...: decode_example(next(it))
   ...:
Out[1]:
{'audio': array([ 3.8138387e-06, -3.8721851e-06,  3.9331076e-06, ...,
        -3.6526076e-06,  3.7041993e-06, -3.7578957e-06], dtype=float32),
 'instrument': array([417], dtype=int64),
 'instrument_family': array([0], dtype=int64),
 'instrument_family_str': [b'bass'],
 'instrument_source': array([2], dtype=int64),
 'instrument_source_str': [b'synthetic'],
 'instrument_str': [b'bass_synthetic_033'],
 'note': array([149013], dtype=int64),
 'note_str': [b'bass_synthetic_033-100-100'],
 'pitch': array([100], dtype=int64),
 'qualities': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64),
 'sample_rate': array([16000], dtype=int64),
 'velocity': array([100], dtype=int64)}

You can install it by pip install tfrecord_lite.