how to kill hadoop jobs
I want to kill all my hadoop jobs automatically when my code encounters an unhandled exception. I am wondering what is the best practice to do it?
Thanks
Solution 1:
Depending on the version, do:
version <2.3.0
Kill a hadoop job:
hadoop job -kill $jobId
You can get a list of all jobId's doing:
hadoop job -list
version >=2.3.0
Kill a hadoop job:
yarn application -kill $ApplicationId
You can get a list of all ApplicationId's doing:
yarn application -list
Solution 2:
Use of folloing command is depreciated
hadoop job -list
hadoop job -kill $jobId
consider using
mapred job -list
mapred job -kill $jobId
Solution 3:
Run list
to show all the jobs, then use the jobID/applicationID in the appropriate command.
Kill mapred jobs:
mapred job -list
mapred job -kill <jobId>
Kill yarn jobs:
yarn application -list
yarn application -kill <ApplicationId>
Solution 4:
An unhandled exception will (assuming it's repeatable like bad data as opposed to read errors from a particular data node) eventually fail the job anyway.
You can configure the maximum number of times a particular map or reduce task can fail before the entire job fails through the following properties:
-
mapred.map.max.attempts
- The maximum number of attempts per map task. In other words, framework will try to execute a map task these many number of times before giving up on it. -
mapred.reduce.max.attempts
- Same as above, but for reduce tasks
If you want to fail the job out at the first failure, set this value from its default of 4 to 1.