spliting custom binary dataset in train/test subsets using tensorflow io

If the meta-data is suppose to be part of your inputs, which I am assuming, you could try something like this:

import random
import struct
import tensorflow as tf
import numpy as np

RAW_N = 2 + 20*20 + 1

bytess = random.sample(range(1, 5000), RAW_N*4)
with open('mydata.bin', 'wb') as f:
  f.write(struct.pack('1612i', *bytess))

def decode_and_prepare(register):
  register = tf.io.decode_raw(register, out_type=tf.float32)
  inputs = register[:402]
  label = register[402:]
  return inputs, label

total_data_entries = 8
raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['/content/mydata.bin', '/content/mydata.bin'], record_bytes=RAW_N*4)
raw_dataset = raw_dataset.map(decode_and_prepare)
raw_dataset = raw_dataset.shuffle(buffer_size=total_data_entries)

train_ds_size = int(0.8 * total_data_entries)
test_ds_size = int(0.2 * total_data_entries)

train_ds = raw_dataset.take(train_ds_size)
remaining_data = raw_dataset.skip(train_ds_size)  
test_ds = remaining_data.take(test_ds_size)

Note that I am using the same bin file twice for demonstration purposes. After running that code snippet, you could feed the datasets to your model like this:

model = build_model()

history = model.fit(train_ds, ...)

loss, mse = model.evaluate(test_ds)

as each dataset contains the inputs and the corresponding labels.

spliting custom binary dataset in train/test subsets using tensorflow io

Related

Recent Posts