Application report for application_ (state: ACCEPTED) never ends for Spark Submit (with Spark 1.2.0 on YARN)
Solution 1:
I had this exact problem when multiple users were trying to run on our cluster at once. The fix was to change setting of the scheduler.
In the file /etc/hadoop/conf/capacity-scheduler.xml
we changed the property yarn.scheduler.capacity.maximum-am-resource-percent
from 0.1
to 0.5
.
Changing this setting increases the fraction of the resources that is made available to be allocated to application masters, increasing the number of masters possible to run at once and hence increasing the number of possible concurrent applications.
Solution 2:
I got this error in this situation:
- MASTER=yarn (or yarn-client)
- spark-submit runs on a computer outside of the cluster and there is no route from the cluster to it because it's hidden by a router
Logs for container_1453825604297_0001_02_000001 (from ResourceManager web UI):
16/01/26 08:30:38 INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable. 16/01/26 08:31:41 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.180:33074, retrying ... 16/01/26 08:32:44 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.180:33074, retrying ... 16/01/26 08:32:45 ERROR yarn.ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver! at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:484)
I workaround it by using yarn cluster mode: MASTER=yarn-cluster.
On another computer which is configured in the similar way, but is's IP is reachable from the cluster, both yarn-client and yarn-cluster work.
Others may encounter this error for different reasons, and my point is that checking error logs (not seen from terminal, but ResourceManager web UI in this case) almost always helps.
Solution 3:
There are three ways we can try to fix this issue.
- Check for spark process on your machine and kill it.
Do
ps aux | grep spark
Take all the process id's with spark processes and kill them, like
sudo kill -9 4567 7865
- Check for number of spark applications running on your cluster.
To check this, do
yarn application -list
you will get an output similar to this:
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1496703976885_00567 ta da SPARK cloudera default RUNNING UNDEFINED 20% http://10.0.52.156:9090
Check for the application id's, if they are more than 1, or more than 2, kill them. Your cluster cannot run more than 2 spark applications at the same time. I am not 100% sure about this, but on cluster if you run more than two spark applications, it will start complaining. So, kill them Do this to kill them:
yarn application -kill application_1496703976885_00567
- Check for your spark config parameters. For example, if you have set more executor memory or driver memory or number of executors on your spark application that may also cause an issue. So, reduce of any of them and run your spark application, that might resolve it.