Spark cannot find a class at runtime for a standalone Scala program

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark cannot find a class at runtime for a standalone Scala program

ssimanta
I'm using sbt to build and run my Spark driver program. It's complaining that it cannot find a class (twitter4j/Status) even though my code compiles fine. 
Do I need to package all external dependencies into one fat jar ? If yes, can someone tell me the preferred way of doing it with sbt. I'm new to Scala and sbt. 

[error] (run-main) org.apache.spark.SparkException: Job aborted: Task 0.0:9 failed 4 times (most recent failure: Exception failure: java.lang.NoClassDefFoundError: twitter4j/Status)
org.apache.spark.SparkException: Job aborted: Task 0.0:9 failed 4 times (most recent failure: Exception failure: java.lang.NoClassDefFoundError: twitter4j/Status)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Reply | Threaded
Open this post in threaded view
|

Re: Spark cannot find a class at runtime for a standalone Scala program

Akhil Das
You can create a lib  directory at the root of your project home and put all the jars in it. 

Or you can specify the location of your jars by setting the .setJars property in the sparkConf like:

val conf = new SparkConf()
             .setMaster("mesos://akhldz:5050")
             .setAppName("My Twitter JOB")
             .setSparkHome("/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.50/lib/spark")
             .setJars(List("target/scala-2.10/simple-project_2.10-2.0.jar","/home/akhld/spark/streaming/twitter/lib/jar1.jar","/home/akhld/spark/streaming/twitter/lib/jar2.jar","/home/akhld/spark/streaming/twitter/lib/jar3.jar"))
             .set("spark.executor.memory", "4g")
             .set("spark.cores.max","2")


And then you can use this conf for your Spark Context. Hope that helps!



Thanks
Best Regards.


On Wed, Feb 19, 2014 at 2:54 AM, Soumya Simanta <[hidden email]> wrote:
I'm using sbt to build and run my Spark driver program. It's complaining that it cannot find a class (twitter4j/Status) even though my code compiles fine. 
Do I need to package all external dependencies into one fat jar ? If yes, can someone tell me the preferred way of doing it with sbt. I'm new to Scala and sbt. 

[error] (run-main) org.apache.spark.SparkException: Job aborted: Task 0.0:9 failed 4 times (most recent failure: Exception failure: java.lang.NoClassDefFoundError: twitter4j/Status)
org.apache.spark.SparkException: Job aborted: Task 0.0:9 failed 4 times (most recent failure: Exception failure: java.lang.NoClassDefFoundError: twitter4j/Status)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)



--
Thanks
Best Regards