Setting spark.akka.frameSize

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Setting spark.akka.frameSize

MattSills
Hi all,

Configuration: Standalone 0.9.1-cdh4 cluster, 7 workers per node, 32gb per worker

I'm running a job on a spark cluster, and running into some strange behavior. After a while, the akka frame sizes exceed 10mb, and then the whole job seizes up. I set "spark.akka.frameSize" to 128 in the SparkConf used to create the spark context (and also set it as a Java system property on the driver, for good measure). After this, the program didn't hang, but immediately failed, and logged an error message like the following:
  (on the master):
    14/05/20 21:49:50 INFO SparkDeploySchedulerBackend: Executor 1 disconnected, so removing it
    14/05/20 21:49:50 ERROR TaskSchedulerImpl: Lost executor 1 on [...]: remote Akka client disassociated
  (on the workers):
    14/05/20 21:50:25 WARN SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection...
    14/05/20 21:50:25 INFO SparkDeploySchedulerBackend: Shutting down all executors
    14/05/20 21:50:25 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
    14/05/20 21:50:25 INFO AppClient: Stop request to Master timed out; it may already be shut down.

After lots of fumbling around, I ended up adding "-Dspark.akka.frameSize=128" to SPARK_JAVA_OPTS in spark-env.sh, under the theory that the workers couldn't read the larger akka messages. This seems to have made things work, but I'm still a little scared. Is this the standard way to set the max akka framesize, or is there a way to set it from the driver and have it propagate to the workers?

Thanks,
Matt

Reply | Threaded
Open this post in threaded view
|

Re: Setting spark.akka.frameSize

Andrew Ash
Hi Matt,

First of all, in Spark 1.0 there's logging when the message exceeds the frame size, so you won't have silent hangs in this scenario anymore.  See https://issues.apache.org/jira/browse/SPARK-1244 and https://github.com/apache/spark/pull/147/files for the details.

As far as the proper way to set spark.akka.frameSize for a standalone cluster, I always thought the normal way of setting it was as documented at http://spark.apache.org/docs/latest/configuration.html i.e. setting it on the SparkConf object before you instantiate the SparkContext.  Shouldn't be any further propagation necessary on the workers as the CoarseGrainedExecutors they start up are seeded with the configuration for that context on initialization.

You can also check the Executors tab on your Application's webui (:4040) to see if the configuration item is picked up:

Inline image 1

Are you still observing stability issues with the job even with those settings?

Cheers!
Andrew



On Fri, May 23, 2014 at 6:08 PM, MattSills <[hidden email]> wrote:
Hi all,

Configuration: Standalone 0.9.1-cdh4 cluster, 7 workers per node, 32gb per
worker

I'm running a job on a spark cluster, and running into some strange
behavior. After a while, the akka frame sizes exceed 10mb, and then the
whole job seizes up. I set "spark.akka.frameSize" to 128 in the SparkConf
used to create the spark context (and also set it as a Java system property
on the driver, for good measure). After this, the program didn't hang, but
immediately failed, and logged an error message like the following:
  (on the master):
    14/05/20 21:49:50 INFO SparkDeploySchedulerBackend: Executor 1
disconnected, so removing it
    14/05/20 21:49:50 ERROR TaskSchedulerImpl: Lost executor 1 on [...]:
remote Akka client disassociated
  (on the workers):
    14/05/20 21:50:25 WARN SparkDeploySchedulerBackend: Disconnected from
Spark cluster! Waiting for reconnection...
    14/05/20 21:50:25 INFO SparkDeploySchedulerBackend: Shutting down all
executors
    14/05/20 21:50:25 INFO SparkDeploySchedulerBackend: Asking each executor
to shut down
    14/05/20 21:50:25 INFO AppClient: Stop request to Master timed out; it
may already be shut down.

After lots of fumbling around, I ended up adding
"-Dspark.akka.frameSize=128" to SPARK_JAVA_OPTS in spark-env.sh, under the
theory that the workers couldn't read the larger akka messages. This /seems/
to have made things work, but I'm still a little scared. Is this the
standard way to set the max akka framesize, or is there a way to set it from
the driver and have it propagate to the workers?

Thanks,
Matt





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Setting-spark-akka-frameSize-tp6337.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.