Process output data from YOLOv5 TFlite

❔Question

Hi, I have successfully trained a custom model based on YOLOv5s and converted the model to TFlite. I feel silly asking, but how do you use the output data?

I get as output:

  • StatefulPartitionedCall: 0 = [1,25200,7] from the converted YOLOv5 model Netron YOLOv5s.tflite model

But I expect an output like:

  • StatefulPartitionedCall:3 = [1, 10, 4] # boxes
  • StatefulPartitionedCall:2 = [1, 10] # classes
  • StatefulPartitionedCall:1 = [1, 10] #scores
  • StatefulPartitionedCall:0 = [1] #count (this one is from a tensorflow lite mobilenet model (trained to give 10 output data, default for tflite)) Netron mobilenet.tflite model

It may also be some other form of output, but I honestly have no idea how to get the boxes, classes, scores from a [1,25200,7] array. (on 15-January-2021 I updated pytorch, tensorflow and yolov5 to the latest version)

The data contained in the [1, 25200, 7] array can be found in this file: outputdata.txt

0.011428807862102985, 0.006756599526852369, 0.04274776205420494, 0.034441519528627396, 0.00012877583503723145, 0.33658933639526367, 0.4722323715686798
0.023071227595210075, 0.006947836373001337, 0.046426184475421906, 0.023744791746139526, 0.0002465546131134033, 0.29862138628959656, 0.4498370885848999
0.03636947274208069, 0.006819264497607946, 0.04913407564163208, 0.025004519149661064, 0.00013208389282226562, 0.3155967593193054, 0.4081345796585083
0.04930267855525017, 0.007249316666275263, 0.04969717934727669, 0.023645592853426933, 0.0001222355494974181, 0.3123127520084381, 0.40113094449043274
...

Should I add a Non Max Suppression or something else, can someone help me please? (github YOLOv5 #1981)


Solution 1:

Thanks to @Glenn Jocher I found the solution. The output is [xywh, conf, class0, class1, ...]

My current code is now:

def classFilter(classdata):
    classes = []  # create a list
    for i in range(classdata.shape[0]):         # loop through all predictions
        classes.append(classdata[i].argmax())   # get the best classification location
    return classes  # return classes (int)

def YOLOdetect(output_data):  # input = interpreter, output is boxes(xyxy), classes, scores
    output_data = output_data[0]                # x(1, 25200, 7) to x(25200, 7)
    boxes = np.squeeze(output_data[..., :4])    # boxes  [25200, 4]
    scores = np.squeeze( output_data[..., 4:5]) # confidences  [25200, 1]
    classes = classFilter(output_data[..., 5:]) # get classes
    # Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
    x, y, w, h = boxes[..., 0], boxes[..., 1], boxes[..., 2], boxes[..., 3] #xywh
    xyxy = [x - w / 2, y - h / 2, x + w / 2, y + h / 2]  # xywh to xyxy   [4, 25200]

    return xyxy, classes, scores  # output is boxes(x,y,x,y), classes(int), scores(float) [predictions length]

To get the output data:

"""Output data"""
output_data = interpreter.get_tensor(output_details[0]['index'])  # get tensor  x(1, 25200, 7)
xyxy, classes, scores = YOLOdetect(output_data) #boxes(x,y,x,y), classes(int), scores(float) [25200]

And for the boxes:

for i in range(len(scores)):
    if ((scores[i] > 0.1) and (scores[i] <= 1.0)):
        H = frame.shape[0]
        W = frame.shape[1]
        xmin = int(max(1,(xyxy[0][i] * W)))
        ymin = int(max(1,(xyxy[1][i] * H)))
        xmax = int(min(H,(xyxy[2][i] * W)))
        ymax = int(min(W,(xyxy[3][i] * H)))

        cv2.rectangle(frame, (xmin,ymin), (xmax,ymax), (10, 255, 0), 2)
        ...