pyspark crash on mesos

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

pyspark crash on mesos

Hi All,

After switching from standalone Spark to Mesos I'm experiencing some instability.  I'm running pyspark interactively through iPython notebook, and get this crash non-deterministically (although pretty reliably in the first 2000 tasks, often much sooner).

Exception in thread "DAGScheduler" org.apache.spark.SparkException: EOF reached before Python server acknowledged
        at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:340)
        at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:311)
        at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:70)
        at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:253)
        at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:251)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
        at scala.collection.Iterator$class.foreach(Iterator.scala:772)
        at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
        at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
        at org.apache.spark.Accumulators$.add(Accumulators.scala:251)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:662)
        at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:437)
        at org.apache.spark.scheduler.DAGScheduler$$anon$

I'm running the following software versions on all machines:
Spark: 0.8.1  (md5: 5d3c56eaf91c7349886d5c70439730b3)
Mesos: 0.13.0  (md5: 220dc9c1db118bc7599d45631da578b9)
Python 2.7.3 (Stackoverflow mentioned differing python versions may be to blame --- unless Spark or iPython is specifically invoking an older version under the hood mine are all the same).
Ubuntu 12.0.4

I've modified as follows:
I had problems launching the cluster with and traced the problem to (what seemed to be) a bug in which used a "--conf" flag that mesos-slave and mesos-master didn't recognize.  I removed the flag and instead added code to read in environment variables from then worked as advertised.

Incase it's helpful, I've inclucded several files as follows:
*spark_full_output: output of ipython process where SparkContext was created
* mesos config file from slave (identical to master except for MESOS_MASTER)
* spark config file
*mesos-master.INFO: log file from mesos-master
*mesos-master.WARNING: log file from mesos-master
* my modified version of

Incase anybody from Berkeley is so interested they want to interact with my deployment, my office is in Soda hall so that can definitely be arranged.

-Brad Miller