Test labels for regression caffe, float not allowed?

Solution 1:

When using the image dataset input layer (with either lmdb or leveldb backend) caffe only supports one integer label per input image.

If you want to do regression, and use floating point labels, you should try and use the HDF5 data layer. See for example this question.

In python you can use h5py package to create hdf5 files.

import h5py, os
import caffe
import numpy as np

SIZE = 224 # fixed size to all images
with open( 'train.txt', 'r' ) as T :
    lines = T.readlines()
# If you do not have enough memory split data into
# multiple batches and generate multiple separate h5 files
X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' ) 
y = np.zeros( (len(lines),1), dtype='f4' )
for i,l in enumerate(lines):
    sp = l.split(' ')
    img = caffe.io.load_image( sp[0] )
    img = caffe.io.resize( img, (SIZE, SIZE, 3) ) # resize to fixed size
    # you may apply other input transformations here...
    # Note that the transformation should take img from size-by-size-by-3 and transpose it to 3-by-size-by-size
    # for example
    # transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
    X[i] = transposed_img
    y[i] = float(sp[1])
with h5py.File('train.h5','w') as H:
    H.create_dataset( 'X', data=X ) # note the name X given to the dataset!
    H.create_dataset( 'y', data=y ) # note the name y given to the dataset!
with open('train_h5_list.txt','w') as L:
    L.write( 'train.h5' ) # list all h5 files you are going to use

Once you have all h5 files and the corresponding test files listing them you can add an HDF5 input layer to your train_val.prototxt:

 layer {
   type: "HDF5Data"
   top: "X" # same name as given in create_dataset!
   top: "y"
   hdf5_data_param {
     source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
     batch_size: 32
   }
   include { phase:TRAIN }
 }

Clarification:
When I say "caffe only supports one integer label per input image" I do not mean that the leveldb/lmdb containers are limited, I meant the tools of caffe, specifically the convert_imageset tool.
At closer inspection, it seems like caffe stores data of type Datum in leveldb/lmdb and the "label" property of this type is defined as integer (see caffe.proto) thus when using caffe interface to leveldb/lmdb you are restricted to a single int32 label per image.

Solution 2:

Shai's answer already covers saving float labels to HDF5 format. In case LMDB is required/preferred, here's a snippet on how to create an LMDB from float data (adapted from this github comment):

import lmdb
import caffe
def scalars_to_lmdb(scalars, path_dst):

    db = lmdb.open(path_dst, map_size=int(1e12))

    with db.begin(write=True) as in_txn:    
        for idx, x in enumerate(scalars):            
            content_field = np.array([x])
            # get shape (1,1,1)
            content_field = np.expand_dims(content_field, axis=0)
            content_field = np.expand_dims(content_field, axis=0)
            content_field = content_field.astype(float)

            dat = caffe.io.array_to_datum(content_field)
            in_txn.put('{:0>10d}'.format(idx) dat.SerializeToString())
    db.close()

Solution 3:

I ended up transposing, switching the channel order, and using unsigned ints rather than floats to get results. I suggest reading an image back from your HDF5 file to make sure it displays correctly.

First read the image as unsigned ints:

img = np.array(Image.open('images/' + image_name))

Then change the channel order from RGB to BGR:

img = img[:, :, ::-1]

Finally, switch from Height x Width x Channels to Channels x Height x Width:

img = img.transpose((2, 0, 1))

Merely changing the shape will scramble your image and ruin your data!

To read back the image:

with h5py.File(h5_filename, 'r') as hf:
    images_test = hf.get('images')
    targets_test = hf.get('targets')
    for i, img in enumerate(images_test):
        print(targets_test[i])
        from skimage.viewer import ImageViewer
        viewer = ImageViewer(img.reshape(SIZE, SIZE, 3))
        viewer.show()

Here's a script I wrote which deals with two labels (steer and speed) for a self-driving car task: https://gist.github.com/crizCraig/aa46105d34349543582b177ae79f32f0