sbt dependencies for running Standalone app on Spark v0.9.0-incubating-SNAPSHOT

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

sbt dependencies for running Standalone app on Spark v0.9.0-incubating-SNAPSHOT

ssimanta

I've build a recent version of Spark (commit 23af00f9e0e5108f62cdb9629e3eb4e54bbaa321).

My Hadoop version is - 


SPARK_HADOOP_VERSION=0.20.2-cdh3u6


I've a very simple Standalone app that I want to run on my cluster. The simple.sbt for that app looks like. 


name := "Simple Project"

version := "1.0"

scalaVersion := "2.9.3"

libraryDependencies += "org.apache.spark" %% "spark-core" % "0.8.1-incubating"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "0.20.2-cdh3u6"

resolvers ++= Seq("Akka Repository" at "http://repo.akka.io/releases/","Spray Repository" at "http://repo.spray.cc/")



I can do a sbt package successfully. However when I do a sbt run I get the following exception. I guess the spark-core version above is wrong. How do I make it point to the local build I've or should be revert back to the 0.8.1-incubating


[error] (run-main) org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)

at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)

at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)

at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:176)

at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)

at scala.Option.getOrElse(Option.scala:108)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)

at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)

at scala.Option.getOrElse(Option.scala:108)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)

at org.apache.spark.rdd.RDD.take(RDD.scala:789)

at SimpleApp$.main(SimpleApp.scala:12)

at SimpleApp.main(SimpleApp.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:622)

[trace] Stack trace suppressed: run last compile:run for the full output.

14/02/04 20:52:28 INFO network.ConnectionManager: Selector thread was interrupted!




Reply | Threaded
Open this post in threaded view
|

Re: sbt dependencies for running Standalone app on Spark v0.9.0-incubating-SNAPSHOT

ssimanta
I updated my simple.sbt file to the following but I still get the version mismatch exception. 


name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.0"

libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "0.9.0-incubating"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "0.20.2-cdh3u6"

resolvers ++= Seq("Akka Repository" at "http://repo.akka.io/releases/","Spray Repository" at "http://repo.spray.cc/")



Exception. 

[error] (run-main) org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)

at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)

at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)

at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:176)

at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)

at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)

at org.apache.spark.rdd.RDD.take(RDD.scala:824)

at SimpleApp$.main(SimpleApp.scala:12)

at SimpleApp.main(SimpleApp.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:622)



On Tue, Feb 4, 2014 at 9:04 PM, Soumya Simanta <[hidden email]> wrote:

I've build a recent version of Spark (commit 23af00f9e0e5108f62cdb9629e3eb4e54bbaa321).

My Hadoop version is - 


SPARK_HADOOP_VERSION=0.20.2-cdh3u6


I've a very simple Standalone app that I want to run on my cluster. The simple.sbt for that app looks like. 


name := "Simple Project"

version := "1.0"

scalaVersion := "2.9.3"

libraryDependencies += "org.apache.spark" %% "spark-core" % "0.8.1-incubating"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "0.20.2-cdh3u6"

resolvers ++= Seq("Akka Repository" at "http://repo.akka.io/releases/","Spray Repository" at "http://repo.spray.cc/")



I can do a sbt package successfully. However when I do a sbt run I get the following exception. I guess the spark-core version above is wrong. How do I make it point to the local build I've or should be revert back to the 0.8.1-incubating


[error] (run-main) org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)

at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)

at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)

at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:176)

at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)

at scala.Option.getOrElse(Option.scala:108)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)

at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)

at scala.Option.getOrElse(Option.scala:108)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)

at org.apache.spark.rdd.RDD.take(RDD.scala:789)

at SimpleApp$.main(SimpleApp.scala:12)

at SimpleApp.main(SimpleApp.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:622)

[trace] Stack trace suppressed: run last compile:run for the full output.

14/02/04 20:52:28 INFO network.ConnectionManager: Selector thread was interrupted!





Reply | Threaded
Open this post in threaded view
|

Re: sbt dependencies for running Standalone app on Spark v0.9.0-incubating-SNAPSHOT

ssimanta
I was able to find a solution to this issue. Just posting in case someone in the future has a similar issue. 
In summary I installed the spark-core generated by building my spark version (with my Hadoop version) into my local Maven repo (~/.m2/) 

1. I added a local Maven Repository. 

name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.0"

libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "0.9.0-incubating_SNAPSHOT"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "0.20.2-cdh3u6"

resolvers ++= Seq("Akka Repository" at "http://repo.akka.io/releases/","Spray Repository" at "http://repo.spray.cc/",  "Local Maven Repository" at "file://"+Path.userHome.absolutePath+"/.m2/repository")


2. I added the jar generated by building  local Maven Repository using the following command.  

mvn -e install:install-file -Dfile=/home/myuserid/$SPARK_INSTALL_DIR/dist/jars/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop0.20.2-cdh3u6.jar -DgroupId=org.apache.spark -DartifactId=spark-core_2.10 -Dversion=0.9.0-incubating_SNAPSHOT -Dpackaging=jar


3. $sbt package 

4. $sbt run 


On Tue, Feb 4, 2014 at 9:52 PM, Soumya Simanta <[hidden email]> wrote:
I updated my simple.sbt file to the following but I still get the version mismatch exception. 


name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.0"

libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "0.9.0-incubating"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "0.20.2-cdh3u6"

resolvers ++= Seq("Akka Repository" at "http://repo.akka.io/releases/","Spray Repository" at "http://repo.spray.cc/")



Exception. 

[error] (run-main) org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)

at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)

at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)

at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:176)

at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)

at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)

at org.apache.spark.rdd.RDD.take(RDD.scala:824)

at SimpleApp$.main(SimpleApp.scala:12)

at SimpleApp.main(SimpleApp.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:622)



On Tue, Feb 4, 2014 at 9:04 PM, Soumya Simanta <[hidden email]> wrote:

I've build a recent version of Spark (commit 23af00f9e0e5108f62cdb9629e3eb4e54bbaa321).

My Hadoop version is - 


SPARK_HADOOP_VERSION=0.20.2-cdh3u6


I've a very simple Standalone app that I want to run on my cluster. The simple.sbt for that app looks like. 


name := "Simple Project"

version := "1.0"

scalaVersion := "2.9.3"

libraryDependencies += "org.apache.spark" %% "spark-core" % "0.8.1-incubating"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "0.20.2-cdh3u6"

resolvers ++= Seq("Akka Repository" at "http://repo.akka.io/releases/","Spray Repository" at "http://repo.spray.cc/")



I can do a sbt package successfully. However when I do a sbt run I get the following exception. I guess the spark-core version above is wrong. How do I make it point to the local build I've or should be revert back to the 0.8.1-incubating


[error] (run-main) org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)

at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)

at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)

at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:176)

at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)

at scala.Option.getOrElse(Option.scala:108)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)

at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)

at scala.Option.getOrElse(Option.scala:108)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)

at org.apache.spark.rdd.RDD.take(RDD.scala:789)

at SimpleApp$.main(SimpleApp.scala:12)

at SimpleApp.main(SimpleApp.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:622)

[trace] Stack trace suppressed: run last compile:run for the full output.

14/02/04 20:52:28 INFO network.ConnectionManager: Selector thread was interrupted!






Reply | Threaded
Open this post in threaded view
|

Re: sbt dependencies for running Standalone app on Spark v0.9.0-incubating-SNAPSHOT

Anita Tailor
You need to remove the spark core library dependency  and add your custom built spark jar using following setting in build.sbt

unmanagedBase := baseDirectory.value / "lib_jars"

copy your spark jar to lib_jars in your project home




On 5 February 2014 10:10, Soumya Simanta <[hidden email]> wrote:
I was able to find a solution to this issue. Just posting in case someone in the future has a similar issue. 
In summary I installed the spark-core generated by building my spark version (with my Hadoop version) into my local Maven repo (~/.m2/) 

1. I added a local Maven Repository. 

name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.0"

libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "0.9.0-incubating_SNAPSHOT"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "0.20.2-cdh3u6"

resolvers ++= Seq("Akka Repository" at "http://repo.akka.io/releases/","Spray Repository" at "http://repo.spray.cc/",  "Local Maven Repository" at "file://"+Path.userHome.absolutePath+"/.m2/repository")


2. I added the jar generated by building  local Maven Repository using the following command.  

mvn -e install:install-file -Dfile=/home/myuserid/$SPARK_INSTALL_DIR/dist/jars/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop0.20.2-cdh3u6.jar -DgroupId=org.apache.spark -DartifactId=spark-core_2.10 -Dversion=0.9.0-incubating_SNAPSHOT -Dpackaging=jar


3. $sbt package 

4. $sbt run 


On Tue, Feb 4, 2014 at 9:52 PM, Soumya Simanta <[hidden email]> wrote:
I updated my simple.sbt file to the following but I still get the version mismatch exception. 


name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.0"

libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "0.9.0-incubating"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "0.20.2-cdh3u6"

resolvers ++= Seq("Akka Repository" at "http://repo.akka.io/releases/","Spray Repository" at "http://repo.spray.cc/")



Exception. 

[error] (run-main) org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)

at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)

at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)

at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:176)

at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)

at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)

at org.apache.spark.rdd.RDD.take(RDD.scala:824)

at SimpleApp$.main(SimpleApp.scala:12)

at SimpleApp.main(SimpleApp.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:622)



On Tue, Feb 4, 2014 at 9:04 PM, Soumya Simanta <[hidden email]> wrote:

I've build a recent version of Spark (commit 23af00f9e0e5108f62cdb9629e3eb4e54bbaa321).

My Hadoop version is - 


SPARK_HADOOP_VERSION=0.20.2-cdh3u6


I've a very simple Standalone app that I want to run on my cluster. The simple.sbt for that app looks like. 


name := "Simple Project"

version := "1.0"

scalaVersion := "2.9.3"

libraryDependencies += "org.apache.spark" %% "spark-core" % "0.8.1-incubating"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "0.20.2-cdh3u6"

resolvers ++= Seq("Akka Repository" at "http://repo.akka.io/releases/","Spray Repository" at "http://repo.spray.cc/")



I can do a sbt package successfully. However when I do a sbt run I get the following exception. I guess the spark-core version above is wrong. How do I make it point to the local build I've or should be revert back to the 0.8.1-incubating


[error] (run-main) org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)

at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)

at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)

at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:176)

at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)

at scala.Option.getOrElse(Option.scala:108)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)

at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)

at scala.Option.getOrElse(Option.scala:108)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)

at org.apache.spark.rdd.RDD.take(RDD.scala:789)

at SimpleApp$.main(SimpleApp.scala:12)

at SimpleApp.main(SimpleApp.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:622)

[trace] Stack trace suppressed: run last compile:run for the full output.

14/02/04 20:52:28 INFO network.ConnectionManager: Selector thread was interrupted!







Reply | Threaded
Open this post in threaded view
|

Re: sbt dependencies for running Standalone app on Spark v0.9.0-incubating-SNAPSHOT

Mark Hamstra
Just use `mvn install` or `sbt publish-local` (depending on which build system you prefer to use) to put your locally-built artifacts into your .m2 or .ivy2 cache, respectively.  A typical maven or sbt configuration will resolve them from those caches without any special modifications or further copying of artifacts.


On Tue, Feb 4, 2014 at 9:56 PM, Anita Tailor <[hidden email]> wrote:
You need to remove the spark core library dependency  and add your custom built spark jar using following setting in build.sbt

unmanagedBase := baseDirectory.value / "lib_jars"

copy your spark jar to lib_jars in your project home




On 5 February 2014 10:10, Soumya Simanta <[hidden email]> wrote:
I was able to find a solution to this issue. Just posting in case someone in the future has a similar issue. 
In summary I installed the spark-core generated by building my spark version (with my Hadoop version) into my local Maven repo (~/.m2/) 

1. I added a local Maven Repository. 

name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.0"

libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "0.9.0-incubating_SNAPSHOT"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "0.20.2-cdh3u6"

resolvers ++= Seq("Akka Repository" at "http://repo.akka.io/releases/","Spray Repository" at "http://repo.spray.cc/",  "Local Maven Repository" at "file://"+Path.userHome.absolutePath+"/.m2/repository")


2. I added the jar generated by building  local Maven Repository using the following command.  

mvn -e install:install-file -Dfile=/home/myuserid/$SPARK_INSTALL_DIR/dist/jars/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop0.20.2-cdh3u6.jar -DgroupId=org.apache.spark -DartifactId=spark-core_2.10 -Dversion=0.9.0-incubating_SNAPSHOT -Dpackaging=jar


3. $sbt package 

4. $sbt run 


On Tue, Feb 4, 2014 at 9:52 PM, Soumya Simanta <[hidden email]> wrote:
I updated my simple.sbt file to the following but I still get the version mismatch exception. 


name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.0"

libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "0.9.0-incubating"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "0.20.2-cdh3u6"

resolvers ++= Seq("Akka Repository" at "http://repo.akka.io/releases/","Spray Repository" at "http://repo.spray.cc/")



Exception. 

[error] (run-main) org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)

at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)

at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)

at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:176)

at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)

at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)

at org.apache.spark.rdd.RDD.take(RDD.scala:824)

at SimpleApp$.main(SimpleApp.scala:12)

at SimpleApp.main(SimpleApp.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:622)



On Tue, Feb 4, 2014 at 9:04 PM, Soumya Simanta <[hidden email]> wrote:

I've build a recent version of Spark (commit 23af00f9e0e5108f62cdb9629e3eb4e54bbaa321).

My Hadoop version is - 


SPARK_HADOOP_VERSION=0.20.2-cdh3u6


I've a very simple Standalone app that I want to run on my cluster. The simple.sbt for that app looks like. 


name := "Simple Project"

version := "1.0"

scalaVersion := "2.9.3"

libraryDependencies += "org.apache.spark" %% "spark-core" % "0.8.1-incubating"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "0.20.2-cdh3u6"

resolvers ++= Seq("Akka Repository" at "http://repo.akka.io/releases/","Spray Repository" at "http://repo.spray.cc/")



I can do a sbt package successfully. However when I do a sbt run I get the following exception. I guess the spark-core version above is wrong. How do I make it point to the local build I've or should be revert back to the 0.8.1-incubating


[error] (run-main) org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)

at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)

at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)

at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:176)

at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)

at scala.Option.getOrElse(Option.scala:108)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)

at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)

at scala.Option.getOrElse(Option.scala:108)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)

at org.apache.spark.rdd.RDD.take(RDD.scala:789)

at SimpleApp$.main(SimpleApp.scala:12)

at SimpleApp.main(SimpleApp.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:622)

[trace] Stack trace suppressed: run last compile:run for the full output.

14/02/04 20:52:28 INFO network.ConnectionManager: Selector thread was interrupted!