Using gcloud ml serving for large images

I have a trained net in tensorflow that i wish to use in gcloud ml-engine serving for prediction.

Predict gcloud ml serving should accept numpy array float32 type images with size of 320x240x3 and return 2 tiny matrices as an output.

Does anyone knows how should i create the input layers that would accept this kind of input type?

I have tried multiple ways, for example using base64 encoded json files, but casting the string into float type produces an error in which it's not supported:

"error": "Prediction failed: Exception during model execution: LocalError(code=StatusCode.UNIMPLEMENTED, details=\"Cast string to float is not supported\n\t [[Node: ToFloat = Cast[DstT=DT_FLOAT, SrcT=DT_STRING, _output_shapes=[[-1,320,240,3]], _device=\"/job:localhost/replica:0/task:0/cpu:0\"](ParseExample/ParseExample)]]\")"

This is an example of creating the json file (after saving the numpy array above as jpeg):

python -c 'import base64, sys, json; img = base64.b64encode(open(sys.argv[1], "rb").read()); print json.dumps({"images": {"b64": img}})' example_img.jpg &> request.json

And the tensorflow commands attempting to handle the input:

raw_str_input = tf.placeholder(tf.string, name='source')
feature_configs = {
                'image': tf.FixedLenFeature(
                    shape=[], dtype=tf.string),
            }
tf_example = tf.parse_example(raw_str_input, feature_configs)
input = tf.identity(tf.to_float(tf_example['image/encoded']), name='input')

the above is an example of one of the tests done, also tried multiple attempts of different tensorflow commands to handle the input but none of them worked...


Solution 1:

I would recommended not using parse_example to start with. There are several options for sending image data, each with tradeoffs in complexity and payload size:

  1. Raw Tensor Encoded as JSON
  2. Tensors Packed as Byte Strings
  3. Compressed Image Data

In each case, it is important to note that the input placeholders must have 'None' as the outer-dimension of their shape. This is the "batch_size" dimension (required, even if you intend to send images one-by-one to the service).

Raw Tensor Encoded as JSON

# Dimensions represent [batch size, height width, channels]
input_images = tf.placeholder(dtype=tf.float32, shape=[None,320,240,3], name='source')
output_tensor = foo(input_images)

# Export the SavedModel
inputs = {'image': input_images}
outputs = {'output': output_tensor}
# ....

The JSON you send to the service will look like as documented (see "Instances JSON string"). For example, (I recommend removing as much white space as possible; pretty printed here for readability):

{
  "instances": [
    {
      "image": [
        [
          [1,1,1], [1,1,1], ... 240 total ... [1,1,1]
        ],
        ... 320 total ...
        [
          [1,1,1], [1,1,1], ... 240 total ... [1,1,1]
        ]
      ]
    },
    {
      "image": [ ... repeat if you have more than one image in the request ... ]
  ]
}

Please note that gcloud builds that request body from an input file format where each input is on a separate line (and most be packed on a single line), i.e.:

{"image": [[[1,1,1], [1,1,1],  <240 of these>] ... <320 of these>]}
{"image": [[[2,2,2], [2,2,2],  <240 of these>] ... <320 of these>]}

Tensors Packed as Byte Strings

If you're doing resizing, etc. on the client, my recommendation is to send a byte string. JSON can be a fairly inefficient way to send floats over the wire; even sending integer data causes bloat. Instead, you can encode the bytes on the client and decode them in TensorFlow. My recommendation is to use uint8 data.

This is the TensorFlow Model code to decode bytes strings:

raw_byte_strings = tf.placeholder(dtype=tf.string, shape=[None], name='source')

# Decode the images. The shape of raw_byte_strings is [batch size]
# (were batch size is determined by how many images are sent), and
# the shape of `input_images` is [batch size, 320, 240, 3]. It's
# important that all of the images sent have the same dimensions
# or errors will result.
#
# We have to use a map_fn because decode_raw only works on a single
# image, and we need to decode a batch of images.
decode = lambda raw_byte_str: tf.decode_raw(raw_byte_str, tf.uint8)
input_images = tf.map_fn(decode, raw_byte_strings, dtype=tf.uint8)

output_tensor = foo(input_images)

# Export the SavedModel
inputs = {'image_bytes': input_images}
outputs = {'output': output_tensor}
# ....

One special note here: as pointed out by Jeremy Lewi, the name of this input alias must end in _bytes (image_bytes). This is because JSON doesn't have a way of distinguish text form binary data.

Note that the same trick can be applied to float data, not just uint8 data.

Your client would be responsible for creating a bytes string of uint8s. Here's how you would do that in Python using numpy.

import base64
import json
import numpy as np

images = []
# In real life, this is obtained via other means, e.g. scipy.misc.imread), for now, an array of all 1s 
images.append(np.array([[[2]*3]*240]*320], dtype=np.uint8))
# If we want, we can send more than one image:
images.append(np.array([[[2]*3]*240]*320], dtype=np.uint8))

# Convert each image to byte strings
bytes_strings = (i.tostring() for i in images)

# Base64 encode the data
encoded = (base64.b64encode(b) for b in bytes_strings)

# Create a list of images suitable to send to the service as JSON:
instances = [{'image_bytes': {'b64': e}} for e in encoded]

# Create a JSON request
request = json.dumps({'instances': instances})

# Or if dumping a file for gcloud:
file_data = '\n'.join(json.dumps(instances))

Compressed Image Data

It is often most convenient to send the original images and do the resizing and decoding in TensorFlow. This is exemplified in this sample, which I won't repeat here. The client simply needs to send the raw JPEG bytes. Same note about _bytes suffix applies here.

Solution 2:

If you're using binary data with predictions your input/output aliases must end in 'bytes'. So I think you need to do

python -c 'import base64, sys, json; img = base64.b64encode(open(sys.argv[1], "rb").read()); print json.dumps({"images_bytes": {"b64": img}})' example_img.jpg &> request.json