Saving and opening a tensorflow dataset

I have created and saved a dataset which looks like this:

# line 1
foo+++$+++faa+++$+++fee
# +++$+++ is the separator

I've saved like a .txt and then saved to tf with

from tensorflow.data import TextLineDataset
from tensorflow.data.experimental import save, load
tfsaved = TextLineDataset('path_to_file.txt')
save(tfsaved, 'path_tf_dataset')

But, when I load the dataset, it looks like this:

# Line 1
foofaafee

Can I, in any way, show to tf that +++$+++ is my separator? If not, how can I solve this?

Here is a simple example of how you can read your data using pandas and pass it to tf.data.Dataset.from_tensor_slices:

data.csv

feature1+++$+++feature2+++$+++feature3
foo+++$+++faa+++$+++fee
foo+++$+++faa+++$+++fee
foo+++$+++faa+++$+++fee
foo+++$+++faa+++$+++fee
foo+++$+++faa+++$+++fee
foo+++$+++faa+++$+++fee
foo+++$+++faa+++$+++fee

import pandas as pd 
import tensorflow as tf

df =  pd.read_csv('data.csv', sep='\+\+\+\$\+\+\+', engine='python')
ds = tf.data.Dataset.from_tensor_slices((dict(df)))

for d in ds.take(3):
  tf.print(d)

{'feature1': "foo", 'feature2': "faa", 'feature3': "fee"}
{'feature1': "foo", 'feature2': "faa", 'feature3': "fee"}
{'feature1': "foo", 'feature2': "faa", 'feature3': "fee"}

Note that I had to escape the characters + and $, since they are special regex characters.

Saving and opening a tensorflow dataset

Related

Recent Posts