livenessProbe doesn't work as expected

Within my deployment, the following livenessProbe is defined:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-deployment
  labels:
    name: backend-deployment
    app: fc-test
spec:
  replicas: 1
  selector:
    matchLabels:
      name: fc-backend-pod
      app: fc-test
  template:
    metadata:
      name: fc-backend-pod
      labels:
        name: fc-backend-pod
        app: fc-test
    spec:
      containers:
      - name: fc-backend
        image: localhost:5000/backend:1.3
        ports:
        - containerPort: 4042
        env:
        - name: NODE_ENV
          value: "int"
        livenessProbe:
          exec:
            command:
            - RESULT=$(curl -X GET $BACKEND_SERVICE_HOST:$BACKEND_SERVICE_PORT/api/v2/makes | wc | awk '{print $3}');
            - if [[ $RESULT -lt 150 ]]; then exit 1; else exit 0; fi
          initialDelaySeconds: 20
          failureThreshold: 8
          periodSeconds: 10

Since there are some issues with API connection sometimes, I decided to set up an action checking if the whole set of requested data gets fetched from the API. If it does, the whole set is around 400 KB of size. If it doesn't, only a short message gets returned and the size of the response is lower than 120 B. And this is when the second command from the probe gets in: it checks whether the RESULT environment variable is low: if it is, then it means the response didn't contain all desired data and exits with an error code.

Both commands were tested by calling from inside of the running container, so both cases are covered: a) correct data fetched - exit 0, and b) just an error message fetched - exit 1.

The application running without the probe has been working correctly for at least 3-4 hours, then the problems with connection appeared and they were self-solvable in the end, but choked the app a bit, what was pretty undesirable.

After the probe was implemented, first instability issues started to happen minutes after deployment. Every couple of minutes pods were restarted and the restart count increased in a regular manner.

What I found describing the deployment:

Pod Template:
  Labels:  app=fc-test
           name=fc-backend-pod
  Containers:
   nsc-backend:
    Image:      localhost:5000/backend:1.3
    Port:       4042/TCP
    Host Port:  0/TCP
    Liveness:   exec [RESULT=$(curl -X GET $BACKEND_SERVICE_HOST:$BACKEND_SERVICE_PORT/api/v2/makes | wc | awk '{print $3}'); if [[ $RESULT -lt 150 ]]; then exit 1; else exit 0; fi] delay=20s timeout=1s period=10s #success=1 #failure=8

It looks reasonable, but when entering the running container with exec command, I found out that echo $RESULT gives no output (just an empty line).

Does it mean that only the first call of the probe was somehow processed successfully and all following ones didn't? How to approach the probe configuration to make it work as intended?


Tested two solutions of the issue: the first one was changing the approach a bit (thanks, @moonkotte for the hint about logging: it gave me idea to save some evidence in the app directory). Instead of using an environment variable, I decided to dump the curl's output to a file. Then I started looking for a specific message that is expected to be sent back when something wrong happens with the remote endpoint (Response is empty in this case). If the message is present, grep finds that out, but unlike in normal mode, it returns exit code 1 because of presence of -v argument (inversion). If the specified message desn't get found, everything seems to be fine with the endpoint and exit code 0 is returned, causing the pod to continue operating normally.

The whole command looks like this:

livenessProbe:
  exec:
    command:
    - sh
    - -c
    - >-
        curl -X GET $BACKEND_SERVICE_HOST:$BACKEND_SERVICE_PORT/api/v2/makes |
        head -c 30 > /app/output.log &&
        grep -v 'Response is empty' /app/output.log

The second solution is putting the curl command into a bash script and shipping it together with the whole image. The script itself looks like:

#!/bin/bash
msg=$(curl -X GET $BACKEND_SERVICE_HOST:$BACKEND_SERVICE_PORT/api/v2/makes | head -c 30)
if [[ $msg == *"Response is empty"* ]]; then
        exit 1
else
        exit 0
fi

The script gets invoked by command:

livenessProbe:
  exec:
    command:
    - sh
    - ./liveness_check.sh

Results are the same. The 2nd approach can be used in case of more complex logic or as a workaround.