pyspark crash on mesos

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

pyspark crash on mesos

bmiller1
Hi All,

After switching from standalone Spark to Mesos I'm experiencing some instability.  I'm running pyspark interactively through iPython notebook, and get this crash non-deterministically (although pretty reliably in the first 2000 tasks, often much sooner).

Exception in thread "DAGScheduler" org.apache.spark.SparkException: EOF reached before Python server acknowledged
        at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:340)
        at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:311)
        at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:70)
        at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:253)
        at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:251)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
        at scala.collection.Iterator$class.foreach(Iterator.scala:772)
        at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
        at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
        at org.apache.spark.Accumulators$.add(Accumulators.scala:251)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:662)
        at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:437)
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
        at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)

I'm running the following software versions on all machines:
Spark: 0.8.1  (md5: 5d3c56eaf91c7349886d5c70439730b3)
Mesos: 0.13.0  (md5: 220dc9c1db118bc7599d45631da578b9)
Python 2.7.3 (Stackoverflow mentioned differing python versions may be to blame --- unless Spark or iPython is specifically invoking an older version under the hood mine are all the same).
Ubuntu 12.0.4

I've modified mesos-daemon.sh as follows:
I had problems launching the cluster with mesos-start-cluster.sh and traced the problem to (what seemed to be) a bug in mesos-daemon.sh which used a "--conf" flag that mesos-slave and mesos-master didn't recognize.  I removed the flag and instead added code to read in environment variables from mesos-deploy-env.sh.  mesos-start-cluster.sh then worked as advertised.

Incase it's helpful, I've inclucded several files as follows:
*spark_full_output: output of ipython process where SparkContext was created
*mesos-deploy-env.sh: mesos config file from slave (identical to master except for MESOS_MASTER)
*spark-env.sh: spark config file
*mesos-master.INFO: log file from mesos-master
*mesos-master.WARNING: log file from mesos-master
*mesos-daemon.sh: my modified version of mesos-daemon.sh

Incase anybody from Berkeley is so interested they want to interact with my deployment, my office is in Soda hall so that can definitely be arranged.

-Brad Miller