Saving and opening a tensorflow dataset
I have created and saved a dataset which looks like this:
# line 1
foo+++$+++faa+++$+++fee
# +++$+++ is the separator
I've saved like a .txt
and then saved to tf
with
from tensorflow.data import TextLineDataset
from tensorflow.data.experimental import save, load
tfsaved = TextLineDataset('path_to_file.txt')
save(tfsaved, 'path_tf_dataset')
But, when I load the dataset, it looks like this:
# Line 1
foofaafee
Can I, in any way, show to tf
that +++$+++
is my separator? If not, how can I solve this?
Here is a simple example of how you can read your data using pandas
and pass it to tf.data.Dataset.from_tensor_slices
:
data.csv
feature1+++$+++feature2+++$+++feature3
foo+++$+++faa+++$+++fee
foo+++$+++faa+++$+++fee
foo+++$+++faa+++$+++fee
foo+++$+++faa+++$+++fee
foo+++$+++faa+++$+++fee
foo+++$+++faa+++$+++fee
foo+++$+++faa+++$+++fee
import pandas as pd
import tensorflow as tf
df = pd.read_csv('data.csv', sep='\+\+\+\$\+\+\+', engine='python')
ds = tf.data.Dataset.from_tensor_slices((dict(df)))
for d in ds.take(3):
tf.print(d)
{'feature1': "foo", 'feature2': "faa", 'feature3': "fee"}
{'feature1': "foo", 'feature2': "faa", 'feature3': "fee"}
{'feature1': "foo", 'feature2': "faa", 'feature3': "fee"}
Note that I had to escape the characters +
and $
, since they are special regex characters.