Spark 1.1.1, Hadoop 2.6 - Protobuf conflict

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark 1.1.1, Hadoop 2.6 - Protobuf conflict

michallos
This post has NOT been accepted by the mailing list yet.
Hi,

Seems to me that there is protobuff conflict using Spark 1.1.1 and Hadoop 2.6. I've built it with maven:

mvn -Pyarn -Dhadoop.version=2.6.0 -DskipTests clean package

After submitting spark task with hdfs access (reading or writing) there is exception thrown:

Exception in thread "main" java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:455)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:367)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Class.java:2693)
        at java.lang.Class.privateGetPublicMethods(Class.java:2894)
        at java.lang.Class.privateGetPublicMethods(Class.java:2903)
        at java.lang.Class.getMethods(Class.java:1607)
        at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:451)
        at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:339)
        at java.lang.reflect.Proxy$ProxyClassFactory.apply(Proxy.java:639)
        at java.lang.reflect.Proxy$ProxyClassFactory.apply(Proxy.java:557)
        at java.lang.reflect.WeakCache$Factory.get(WeakCache.java:230)
        at java.lang.reflect.WeakCache.get(WeakCache.java:127)
        at java.lang.reflect.Proxy.getProxyClass0(Proxy.java:419)
        at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:719)
        at org.apache.hadoop.ipc.ProtobufRpcEngine.getProxy(ProtobufRpcEngine.java:105)
        at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:570)
        at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:420)
        at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:316)
        at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:665)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:601)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
        at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
        at org.apache.spark.rdd.FlatMappedRDD.getPartitions(FlatMappedRDD.scala:30)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
        at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65)
        at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:279)
        at ReadingFromHdfs$.main(ReadingFromHdfs.scala:14)
        at ReadingFromHdfs.main(ReadingFromHdfs.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:329)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


App:

object ReadingFromHdfs {

  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Reading from HDFS")
    val sc = new SparkContext(conf)

    val file = sc.textFile("hdfs://localhost/user/hadoop/README.md")
    val counts = file.flatMap(line => line.split(" "))
      .map(word => (word, 1))
      .reduceByKey(_ + _)

    println(counts)
    counts.saveAsTextFile("hdfs://localhost/user/hadoop/result.txt")
  }

}


Is there any workaround for that issue? I see that protobuf is shaded in mesos, akka and spark and don't know what is going on :-/
Reply | Threaded
Open this post in threaded view
|

Re: Spark 1.1.1, Hadoop 2.6 - Protobuf conflict

kmurph
This post has NOT been accepted by the mailing list yet.
I had this problem also with spark 1.1.1.  At the time I was using hadoop 0.20.

To get around it I installed hadoop 2.5.2, and set the protobuf.version to 2.5.0 in the build command like so:
    mvn -Phadoop-2.5 -Dhadoop.version=2.5.2 -Dprotobuf.version=2.5.0 -DskipTests clean package

So I changed spark's pom.xml to read the protobuf.version from the command line.
If I didn't explicitly set protobuf.version it was picking up an older version that existed on my filesystem somewhere,

Karen