What I am missing from configuration?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

What I am missing from configuration?

Dana Tontea
I am completely new to Spark.
I want to run the exemples from here:   https://spark.incubator.apache.org/docs/0.8.1/quick-start.html from section "A Standalone App in Scala".
When I run local with type of scheduler= local scheduler
     val sc = new SparkContext("local[2]", "Simple App", "/home/spark-0.8.1-incubating-bin-cdh4",
                    List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))
I get the result ok. But when I replace with url master from webUI (spark://192.168.6.66:7077)
      val sc = new SparkContext("spark://192.168.6.66:7077", "Simple App","/home/spark-0.8.1-incubating-bin-cdh4", List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))
I get a long error:
Starting task 0.0:1 as TID 6 on executor 0: ro-mysql5.cylex.local (PROCESS_LOCAL)
14/01/23 17:02:48 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:1 as 1801 bytes in 1 ms
14/01/23 17:02:48 WARN cluster.ClusterTaskSetManager: Lost TID 5 (task 0.0:0)
14/01/23 17:02:48 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.OutOfMemoryError: Java heap space [duplicate 5]
The entire log error you can find in atached file.Error

Can somebody explain what I am missing and what's the differences from these 2 schedulers: local[2] and spark://192.168.6.66:7077 ?  Why I can not see in webUI (http://localhost:8080/) the job when run with local[2].
Here SimpleJob.scala are the code from scala and sbt simple.sbt.

 And can please somebody to show me where I can find  a step-by-step tutorial or a course about how setup correctly a cluster and how acces it from IDE :IntelliJ IDEA.
Thanks in advanced!
Reply | Threaded
Open this post in threaded view
|

Re: What I am missing from configuration?

Matei Zaharia
Administrator
Hi Dana,

I think the problem is that your simple.sbt does not add a dependency on hadoop-client for CDH4, so you get a different version of the Hadoop library on your driver application compared to the cluster. Try adding a dependency on hadoop-client version 2.0.0-mr1-cdh4.X.X for your version of CDH4, as well as the following line to add the resolver:

resolvers += "Cloudera Repository"  at "https://repository.cloudera.com/artifactory/cloudera-repos/“

Matei

On Jan 24, 2014, at 2:53 AM, Dana Tontea <[hidden email]> wrote:

> I am completely new to Spark.
> I want to run the exemples from here:  
> https://spark.incubator.apache.org/docs/0.8.1/quick-start.html
> <https://spark.incubator.apache.org/docs/0.8.1/quick-start.html>   from
> section "A Standalone App in Scala".
> When I run local with type of scheduler= local scheduler
>     val sc = new SparkContext("local[2]", "Simple App",
> "/home/*spark-0.8.1-incubating-bin-cdh4*",
>                    List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))
> I get the result ok. But when I replace with url master from webUI
> (spark://192.168.6.66:7077)
>      val sc = new SparkContext("spark://192.168.6.66:7077", "Simple
> App","/home/*spark-0.8.1-incubating-bin-cdh4*",
> List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))
> I get a long error:
> Starting task 0.0:1 as TID 6 on executor 0: ro-mysql5.cylex.local
> (PROCESS_LOCAL)
> 14/01/23 17:02:48 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:1
> as 1801 bytes in 1 ms
> 14/01/23 17:02:48 WARN cluster.ClusterTaskSetManager: Lost TID 5 (task
> 0.0:0)
> 14/01/23 17:02:48 INFO cluster.ClusterTaskSetManager: Loss was due to
> java.lang.OutOfMemoryError: Java heap space [duplicate 5]
> The entire log error you can find in atached file. Error
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n878/Error>  
>
> Can somebody explain what I am missing and what's the differences from these
> 2 schedulers: local[2] and spark://192.168.6.66:7077 ?  Why I can not see in
> webUI (http://localhost:8080/) the job when run with local[2].
> Here  SimpleJob.scala
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n878/SimpleJob.scala>  
> are the code from scala and sbt  simple.sbt
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n878/simple.sbt>
> .
>
> And can please somebody to show me where I can find  a step-by-step
> tutorial or a course about how setup correctly a cluster and how acces it
> from IDE :IntelliJ IDEA.
> Thanks in advanced!
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-I-am-missing-from-configuration-tp878.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: What I am missing from configuration?

Dana Tontea
   Hi Matei,

Firstly thank you a lot for answer.You are right I'm missing on local the hadoop-client dependency.
But in my cluster I deployed the last version of spark-0.9.0 and now on same code I get the next error to sbt package:

[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  ::          UNRESOLVED DEPENDENCIES         ::
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  :: org.apache.spark#spark-core_2.10.3;0.9.0-incubating: not found
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[error] {file:/root/workspace_Spark/scala%20standalone%20app/}default-2327b2/*:update: sbt.ResolveException: unresolved dependency: org.apache.spark#spark-core_2.10.3;0.9.0-incubating: not found
[error] Total time: 12 s, completed Feb 5, 2014 8:12:25 PM
I don't know what  I am missing again...
My scala -version is:
Scala code runner version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL

Thanks in advanced!
Reply | Threaded
Open this post in threaded view
|

Re: What I am missing from configuration?

Andrew Ash

Try depending on spark-core_2.10 rather than 2.10.3 -- the third digit was dropped in the maven artifact and I hit this just yesterday as well.

Sent from my mobile phone

On Feb 5, 2014 10:41 AM, "Dana Tontea" <[hidden email]> wrote:
   Hi Matei,

Firstly thank you a lot for answer.You are right I'm missing on local the
hadoop-client dependency.
But in my cluster I deployed the last version of spark-0.9.0 and now on same
code I get the next error to sbt package:

[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  ::          UNRESOLVED DEPENDENCIES         ::
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  :: org.apache.spark#spark-core_2.10.3;0.9.0-incubating: not found
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[error]
{file:/root/workspace_Spark/scala%20standalone%20app/}default-2327b2/*:update:
sbt.ResolveException: unresolved dependency:
org.apache.spark#spark-core_2.10.3;0.9.0-incubating: not found
[error] Total time: 12 s, completed Feb 5, 2014 8:12:25 PM
I don't know what  I am missing again...
My scala -version is:
Scala code runner version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL

Thanks in advanced!




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-I-am-missing-from-configuration-tp878p1246.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: What I am missing from configuration?

Mark Hamstra
In reply to this post by Dana Tontea
What do you mean by "the last version of spark-0.9.0"?  To be precise, there isn't anything known as spark-0.9.0.  What was released recently is spark-0.9.0-incubating, and there is and only ever will be one version of that.  If you're talking about a 0.9.0-incubating-SNAPSHOT built locally, then you're going to have to specify a commit number for us to know just what you've built -- that's the basic, floating nature of SNAPSHOTs, and it is even more true right now because the master branch of Spark currently says that it is building 0.9.0-incubating-SNAPSHOT when it should be 1.0.0-incubating-SNAPSHOT.

If you're not building Spark locally, then it is a matter of getting the right resolver set in simple.sbt.  If you are re-building Spark (e.g. to change the Hadoop version), then make sure that you are doing `sbt/sbt publish-local` after your build to put your newly-built artifacts into your .ivy2 cache where other sbt projects can find it.


On Wed, Feb 5, 2014 at 10:40 AM, Dana Tontea <[hidden email]> wrote:
   Hi Matei,

Firstly thank you a lot for answer.You are right I'm missing on local the
hadoop-client dependency.
But in my cluster I deployed the last version of spark-0.9.0 and now on same
code I get the next error to sbt package:

[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  ::          UNRESOLVED DEPENDENCIES         ::
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  :: org.apache.spark#spark-core_2.10.3;0.9.0-incubating: not found
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[error]
{file:/root/workspace_Spark/scala%20standalone%20app/}default-2327b2/*:update:
sbt.ResolveException: unresolved dependency:
org.apache.spark#spark-core_2.10.3;0.9.0-incubating: not found
[error] Total time: 12 s, completed Feb 5, 2014 8:12:25 PM
I don't know what  I am missing again...
My scala -version is:
Scala code runner version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL

Thanks in advanced!




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-I-am-missing-from-configuration-tp878p1246.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: What I am missing from configuration?

Dana Tontea
    Hello Mark,
 Firstly sorry for my poor details about my version of Spark. So in my cluster I have CDH 4.5 (Hadoop 2.0.0) installed and now finally I installed succesfully the latest release  from Spark :  spark-0.9.0-incubating.

You are right I must to build locally the spark, but on each node from the cluster with my exactly Hadoop version:
SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 sbt/sbt assembly publish-local

Thanks for your help,