Dynamic Allocation Issues On Spark 2.4.8 (Possible Issue with External Shuffle Service?)

Johny - Oct 14 - - Dev Community

Hey Team,

I am having some issue with dynamic Allocation for spark 2.4.8. I have setup a cluster using your clemlab distribution (https://www.clemlab.com/) . Spark jobs are now running fine. The issue is when I try to use dynamicAllocation options. I am thinking the problems could be due to External Shuffle Service but I feel like it should be setup properly from what I have.

From the resource manager logs we can see that the container goes from ACQUIRED to RELEASED resources which is weird. It does not go to RUNNING state.

I am out of ideas at this point how to make the dynamic Allocation work. So I am turning to you in hope that you may have some insight in the matter.

There are no issues if I do not use dynamic Allocation and spark jobs work just fine but I really want to make dynamic allocation work.

Thank you for the assistance and apologies for the long message but just wanted to supply all details possible.

Here are setting I have in ambari related to it:

Yarn:

Image description

Checking the directories here I can find necessary jar on all nodemanager hosts in the right directory:
/usr/odp/1.2.2.0-138/spark2/yarn/spark-2.4.8.1.2.2.0-138-yarn-shuffle.jar
/usr/odp/current/spark2-cient/yarn/spark-2.4.8.1.2.2.0-138-yarn-shuffle.jar ( I believe there is symbolic link to the above jar)

Spark2:

Image description

 In the spark log I can see this message continuously spamming:

24/10/13 16:38:16 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:38:31 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:38:46 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:39:01 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:39:16 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:39:31 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

.
Terabox Video Player