Kubernetes wait on pod/job

Upon push I want to create a single kaniko build job, it currently works but after the job is finished it shows:

kaniko-5jbhf                  0/1     Completed   0          9m13s

Yet when I run the following it just pauses indefinitely:

kubectl wait --timeout=-1s --for=condition=Completed pod/kaniko

My question can be summarized in 2 parts: 1). How can I wait for a pod/job to finish? 2). How can I remove the job after it has finished?

I have tried ttlSecondsAfterFinished but enabling feature gates in the cluster is problematic and there is no example of how to do it.


Solution 1:

For wait to evaluate the state of a resource, you need to correctly identify it. For the second snippet, you need to provide the pod id instead of the job name: kubectl wait --timeout=-1s --for=condition=Completed pod/kaniko-5jbhf. However, the syntax seems correct for calling the job itself as job/kaniko.

For further reference on wait.

Now, for the Job deletion, if you don't want to use the feature gates, I think you can either access the API programmatically to locate and delete finished Jobs or make them dependant on a parent object that deletes them in cascade. For Jobs specifically, there's only CronJobs. The downside is that CronJobs are meant to be time-scheduled objects, this means that you need to start designing on a time-based object.

Consider that by design, Jobs are meant to stay after the completion to preserve data related to what happened while they're processing data. Also, from v1.12, they're also designed to delete themselves, meaning that enabling these feature gates is probably the most straightforward way to achieve what you want.

Solution 2:

To wait until a resource (like Deployment, Job, etc.) has rolled out, and all its objects are ready, run:

kubectl rollout status {Resource Type} {Resource name}

For example:

  $ kubectl rollout status deployment my-app

    Waiting for rollout to finish: 0 of 1 updated replicas are available...
    Waiting for latest deployment config spec to be observed by the controller loop...
    replication controller "my-app" successfully rolled out

Another way is to wait for a specific pod, by id, or even better - by label. For example:

kubectl wait --for=condition=ready pod -l app=my-app

Note that if the pod was created with kubectl run my-app ... (and not with "create deployment"), the label would probably be run=my-app

Solution 3:

You should not wait on pods, wait works for jobs. I am not sure that Completed should be the condition - complete seems to work at least locally for me, so it should be kubectl wait --timeout=-1s --for=condition=complete job/${job_name}

And besides that, unless you want to wait for it indefinitely, check that you are specifying --timeout=-1s, negative one, as timeout. kubectl wait takes any negative timeout param as "wait for a week".

From Kubernetes docs:

    The length of time to wait before giving up. 
    Zero means check once and don't wait, negative means wait for a week.