NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly with cdh4 binary

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly with cdh4 binary

Roshan Nair
Hi,

I have for testing, a standalone two node spark cluster with spark-0.8.1-incubating-bin-cdh4. I read and write to my hdfs cluster (cdh-4.2).

I have a job that I run from the spark-shell. I always find this error during the first reduceByKey stage. Full stack trace is at the end of this email  

java.lang.NoSuchMethodError (java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly(Ljava/io/Closeable;)V)

org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:986)
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)

Now the strange thing is the task fails a few times on workers on both nodes, but eventually succeeds.

I've double-checked several times that my application jar does not contain hadoop libraries or apache commons io (especially DFSInputStream and IOUtils). 

The worker(on both nodes) and driver classpaths contain only my application jar, spark jars and conf, and the hadoop conf directory. I verified this from the worker process and also from the spark-shell ui environment tab:

/xxx/hadoop-mr/conf System Classpath
/xxx/spark/spark-0.8.1-incubating-bin-cdh4/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar System Classpath
/xxx/spark/spark-0.8.1-incubating-bin-cdh4/conf System Classpath
/xxx/spark/spark-0.8.1-incubating-bin-cdh4/tools/target/spark-tools_2.9.3-0.8.1-incubating.jar System Classpath

There is only one org.apache.commons.io.IOUtils in the classpath (in spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar) and it appears to contain the closeQuietly method. 

The entire stack trace from spark shell ui:

java.lang.NoSuchMethodError (java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly(Ljava/io/Closeable;)V)

org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:986)org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:662)org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:706)java.io.DataInputStream.read(DataInputStream.java:100)org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209)org.apache.hadoop.util.LineReader.readLine(LineReader.java:173)org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:160)org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:38)org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:167)org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:150)org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)scala.collection.Iterator$class.foreach(Iterator.scala:772)scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:102)org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:75)org.apache.spark.rdd.RDD.iterator(RDD.scala:224)org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:32)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)org.apache.spark.rdd.RDD.iterator(RDD.scala:226)org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)org.apache.spark.rdd.RDD.iterator(RDD.scala:226)org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:32)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)org.apache.spark.rdd.RDD.iterator(RDD.scala:226)org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)org.apache.spark.rdd.RDD.iterator(RDD.scala:226)org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)org.apache.spark.rdd.RDD.iterator(RDD.scala:226)org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:159)org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:100)org.apache.spark.scheduler.Task.run(Task.scala:53)
The task succeeds after a few failed attempts, but, I'm stumped at this point as to why this happens.
Any help appreciated.
Roshan
Reply | Threaded
Open this post in threaded view
|

Re: NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly with cdh4 binary

Evgeniy Shishkin
We have this problem too.

Roshan, did you find any workaround?

On 03 Jan 2014, at 10:35, Roshan Nair <[hidden email]> wrote:

> Hi,
>
> I have for testing, a standalone two node spark cluster with spark-0.8.1-incubating-bin-cdh4. I read and write to my hdfs cluster (cdh-4.2).
>
> I have a job that I run from the spark-shell. I always find this error during the first reduceByKey stage. Full stack trace is at the end of this email  
>
> java.lang.NoSuchMethodError (java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly(Ljava/io/Closeable;)V)
>
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:986)
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)
>
> Now the strange thing is the task fails a few times on workers on both nodes, but eventually succeeds.
>
> I've double-checked several times that my application jar does not contain hadoop libraries or apache commons io (especially DFSInputStream and IOUtils).
>
> The worker(on both nodes) and driver classpaths contain only my application jar, spark jars and conf, and the hadoop conf directory. I verified this from the worker process and also from the spark-shell ui environment tab:
>
> /xxx/hadoop-mr/conf System Classpath
> /xxx/spark/spark-0.8.1-incubating-bin-cdh4/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar System Classpath
> /xxx/spark/spark-0.8.1-incubating-bin-cdh4/conf System Classpath
> /xxx/spark/spark-0.8.1-incubating-bin-cdh4/tools/target/spark-tools_2.9.3-0.8.1-incubating.jar System Classpath
> http://192.168.1.1:43557/jars/xyz-1.0-SNAPSHOT-jar-with-dependencies.jar        Added By User
>
> There is only one org.apache.commons.io.IOUtils in the classpath (in spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar) and it appears to contain the closeQuietly method.
>
> The entire stack trace from spark shell ui:
>
> java.lang.NoSuchMethodError (java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly(Ljava/io/Closeable;)V)
>
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:986)
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:662)
> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:706)
> java.io.DataInputStream.read(DataInputStream.java:100)
> org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209)
> org.apache.hadoop.util.LineReader.readLine(LineReader.java:173)
> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:160)
> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:38)
> org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:167)
> org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:150)
> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
> scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
> scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
> scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
> scala.collection.Iterator$class.foreach(Iterator.scala:772)
> scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:102)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:75)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:224)
> org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:32)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:32)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:159)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:100)
> org.apache.spark.scheduler.Task.run(Task.scala:53)
>
> The task succeeds after a few failed attempts, but, I'm stumped at this point as to why this happens.
>
> Any help appreciated.
>
> Roshan

Reply | Threaded
Open this post in threaded view
|

Re: NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly with cdh4 binary

Roshan Nair
Hi Evgeniy,

Nope. I haven't spent much time on it since. I suspected I might be including other jars/directories in the path, but that is definitely not the case.

On one of my workers, this is my worker daemon classpath:
/apps/spark/spark-0.8.1-incubating-bin-cdh4/tools/target/spark-tools_2.9.3-0.8.1-incubating.jar::/apps/spark/spark-0.8.1-incubating-bin-cdh4/conf:/apps/spark/spark-0.8.1-incubating-bin-cdh4/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar:/hadoop-mr/conf -Dspark.akka.logLifecycleEvents=true -Djava.library.path=/usr/lib64:/apps/hadoop-mr/lib/native/Linux-amd64-64

And this is the executor process classpath on the same worker:
/apps/spark/spark-0.8.1-incubating-bin-cdh4/conf:/apps/spark/spark-0.8.1-incubating-bin-cdh4/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar:/hadoop-mr/conf

I'll get around to logging some diagnostics on the class and classloader soon. I'll keep you updated.

This is not a show-stopper at the moment, since the tasks recover eventually, but of course, I want to fix it.

Roshan



On Fri, Jan 10, 2014 at 6:17 PM, Evgeniy Shishkin <[hidden email]> wrote:
We have this problem too.

Roshan, did you find any workaround?

On 03 Jan 2014, at 10:35, Roshan Nair <[hidden email]> wrote:

> Hi,
>
> I have for testing, a standalone two node spark cluster with spark-0.8.1-incubating-bin-cdh4. I read and write to my hdfs cluster (cdh-4.2).
>
> I have a job that I run from the spark-shell. I always find this error during the first reduceByKey stage. Full stack trace is at the end of this email
>
> java.lang.NoSuchMethodError (java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly(Ljava/io/Closeable;)V)
>
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:986)
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)
>
> Now the strange thing is the task fails a few times on workers on both nodes, but eventually succeeds.
>
> I've double-checked several times that my application jar does not contain hadoop libraries or apache commons io (especially DFSInputStream and IOUtils).
>
> The worker(on both nodes) and driver classpaths contain only my application jar, spark jars and conf, and the hadoop conf directory. I verified this from the worker process and also from the spark-shell ui environment tab:
>
> /xxx/hadoop-mr/conf   System Classpath
> /xxx/spark/spark-0.8.1-incubating-bin-cdh4/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar     System Classpath
> /xxx/spark/spark-0.8.1-incubating-bin-cdh4/conf       System Classpath
> /xxx/spark/spark-0.8.1-incubating-bin-cdh4/tools/target/spark-tools_2.9.3-0.8.1-incubating.jar        System Classpath
> http://192.168.1.1:43557/jars/xyz-1.0-SNAPSHOT-jar-with-dependencies.jar      Added By User
>
> There is only one org.apache.commons.io.IOUtils in the classpath (in spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar) and it appears to contain the closeQuietly method.
>
> The entire stack trace from spark shell ui:
>
> java.lang.NoSuchMethodError (java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly(Ljava/io/Closeable;)V)
>
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:986)
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:662)
> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:706)
> java.io.DataInputStream.read(DataInputStream.java:100)
> org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209)
> org.apache.hadoop.util.LineReader.readLine(LineReader.java:173)
> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:160)
> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:38)
> org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:167)
> org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:150)
> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
> scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
> scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
> scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
> scala.collection.Iterator$class.foreach(Iterator.scala:772)
> scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:102)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:75)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:224)
> org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:32)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:32)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:159)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:100)
> org.apache.spark.scheduler.Task.run(Task.scala:53)
>
> The task succeeds after a few failed attempts, but, I'm stumped at this point as to why this happens.
>
> Any help appreciated.
>
> Roshan


Reply | Threaded
Open this post in threaded view
|

Re: NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly with cdh4 binary

srowen
Problems of this form are usually because your code depends on one
version of a library, but Hadoop itself contains a different version
of the library. You successfully compile against, and bundle, your own
version.

But when run in Hadoop, since Java classloaders always delegate to the
parent first, it will end up finding a class in the library from
Hadoop's different copy.

I have had this issue with Guava in the past for example. Hadoop most
certainly depends on Commons IO. (Try mvn dependency:tree to see
everything)

The only mysterious thing is why it fails sometimes but not other times.
Any chance you have differing versions of Hadoop installed on the
nodes? maybe multiple versions, with classpaths cross-wired. Then I
could imagine that it fails when assigned to some nodes but not
others.


On Fri, Jan 10, 2014 at 2:00 PM, Roshan Nair <[hidden email]> wrote:

> Hi Evgeniy,
>
> Nope. I haven't spent much time on it since. I suspected I might be
> including other jars/directories in the path, but that is definitely not the
> case.
>
> On one of my workers, this is my worker daemon classpath:
> /apps/spark/spark-0.8.1-incubating-bin-cdh4/tools/target/spark-tools_2.9.3-0.8.1-incubating.jar::/apps/spark/spark-0.8.1-incubating-bin-cdh4/conf:/apps/spark/spark-0.8.1-incubating-bin-cdh4/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar:/hadoop-mr/conf
> -Dspark.akka.logLifecycleEvents=true
> -Djava.library.path=/usr/lib64:/apps/hadoop-mr/lib/native/Linux-amd64-64
>
> And this is the executor process classpath on the same worker:
> /apps/spark/spark-0.8.1-incubating-bin-cdh4/conf:/apps/spark/spark-0.8.1-incubating-bin-cdh4/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar:/hadoop-mr/conf
>
> I'll get around to logging some diagnostics on the class and classloader
> soon. I'll keep you updated.
>
> This is not a show-stopper at the moment, since the tasks recover
> eventually, but of course, I want to fix it.
>
> Roshan
>
>
>
> On Fri, Jan 10, 2014 at 6:17 PM, Evgeniy Shishkin <[hidden email]>
> wrote:
>>
>> We have this problem too.
>>
>> Roshan, did you find any workaround?
>>
>> On 03 Jan 2014, at 10:35, Roshan Nair <[hidden email]> wrote:
>>
>> > Hi,
>> >
>> > I have for testing, a standalone two node spark cluster with
>> > spark-0.8.1-incubating-bin-cdh4. I read and write to my hdfs cluster
>> > (cdh-4.2).
>> >
>> > I have a job that I run from the spark-shell. I always find this error
>> > during the first reduceByKey stage. Full stack trace is at the end of this
>> > email
>> >
>> > java.lang.NoSuchMethodError (java.lang.NoSuchMethodError:
>> > org.apache.commons.io.IOUtils.closeQuietly(Ljava/io/Closeable;)V)
>> >
>> >
>> > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:986)
>> >
>> > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)
>> >
>> > Now the strange thing is the task fails a few times on workers on both
>> > nodes, but eventually succeeds.
>> >
>> > I've double-checked several times that my application jar does not
>> > contain hadoop libraries or apache commons io (especially DFSInputStream and
>> > IOUtils).
>> >
>> > The worker(on both nodes) and driver classpaths contain only my
>> > application jar, spark jars and conf, and the hadoop conf directory. I
>> > verified this from the worker process and also from the spark-shell ui
>> > environment tab:
>> >
>> > /xxx/hadoop-mr/conf   System Classpath
>> >
>> > /xxx/spark/spark-0.8.1-incubating-bin-cdh4/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar
>> > System Classpath
>> > /xxx/spark/spark-0.8.1-incubating-bin-cdh4/conf       System Classpath
>> >
>> > /xxx/spark/spark-0.8.1-incubating-bin-cdh4/tools/target/spark-tools_2.9.3-0.8.1-incubating.jar
>> > System Classpath
>> > http://192.168.1.1:43557/jars/xyz-1.0-SNAPSHOT-jar-with-dependencies.jar
>> > Added By User
>> >
>> > There is only one org.apache.commons.io.IOUtils in the classpath (in
>> > spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar) and it
>> > appears to contain the closeQuietly method.
>> >
>> > The entire stack trace from spark shell ui:
>> >
>> > java.lang.NoSuchMethodError (java.lang.NoSuchMethodError:
>> > org.apache.commons.io.IOUtils.closeQuietly(Ljava/io/Closeable;)V)
>> >
>> >
>> > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:986)
>> >
>> > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)
>> >
>> > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:662)
>> > org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:706)
>> > java.io.DataInputStream.read(DataInputStream.java:100)
>> > org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209)
>> > org.apache.hadoop.util.LineReader.readLine(LineReader.java:173)
>> >
>> > org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:160)
>> > org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:38)
>> > org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:167)
>> > org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:150)
>> > org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>> >
>> > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
>> > scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
>> > scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
>> > scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
>> > scala.collection.Iterator$class.foreach(Iterator.scala:772)
>> > scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)
>> > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>> >
>> > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:102)
>> > org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:75)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:224)
>> > org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:32)
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> > org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> > org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:32)
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> > org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> >
>> > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:159)
>> >
>> > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:100)
>> > org.apache.spark.scheduler.Task.run(Task.scala:53)
>> >
>> > The task succeeds after a few failed attempts, but, I'm stumped at this
>> > point as to why this happens.
>> >
>> > Any help appreciated.
>> >
>> > Roshan
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly with cdh4 binary

Roshan Nair

Hi Sean,

My code is compiled against the spark cdh 4.2 bundle. My jar doesn't bundle any hadoop or spark libraries. I've confirmed my jar doesn't have the commons utils class.

When I run spark the classpath only contains spark tools jar, spark cdh 4.2 jar, spark conf and the hadoop conf directories. This is both for the spark worker daemon and the job executor itself. Since it doesn't run within hadoop(ie. Its not run as a hadoop job), I don't expect to inherit the hadoop classpath.

Spark uses hadoop client(rpc) to communicate with hdfs, so I don't see how this is the case here.

Or am I missing something?

Roshan

On Jan 10, 2014 9:17 PM, "Sean Owen" <[hidden email]> wrote:
Problems of this form are usually because your code depends on one
version of a library, but Hadoop itself contains a different version
of the library. You successfully compile against, and bundle, your own
version.

But when run in Hadoop, since Java classloaders always delegate to the
parent first, it will end up finding a class in the library from
Hadoop's different copy.

I have had this issue with Guava in the past for example. Hadoop most
certainly depends on Commons IO. (Try mvn dependency:tree to see
everything)

The only mysterious thing is why it fails sometimes but not other times.
Any chance you have differing versions of Hadoop installed on the
nodes? maybe multiple versions, with classpaths cross-wired. Then I
could imagine that it fails when assigned to some nodes but not
others.


On Fri, Jan 10, 2014 at 2:00 PM, Roshan Nair <[hidden email]> wrote:
> Hi Evgeniy,
>
> Nope. I haven't spent much time on it since. I suspected I might be
> including other jars/directories in the path, but that is definitely not the
> case.
>
> On one of my workers, this is my worker daemon classpath:
> /apps/spark/spark-0.8.1-incubating-bin-cdh4/tools/target/spark-tools_2.9.3-0.8.1-incubating.jar::/apps/spark/spark-0.8.1-incubating-bin-cdh4/conf:/apps/spark/spark-0.8.1-incubating-bin-cdh4/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar:/hadoop-mr/conf
> -Dspark.akka.logLifecycleEvents=true
> -Djava.library.path=/usr/lib64:/apps/hadoop-mr/lib/native/Linux-amd64-64
>
> And this is the executor process classpath on the same worker:
> /apps/spark/spark-0.8.1-incubating-bin-cdh4/conf:/apps/spark/spark-0.8.1-incubating-bin-cdh4/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar:/hadoop-mr/conf
>
> I'll get around to logging some diagnostics on the class and classloader
> soon. I'll keep you updated.
>
> This is not a show-stopper at the moment, since the tasks recover
> eventually, but of course, I want to fix it.
>
> Roshan
>
>
>
> On Fri, Jan 10, 2014 at 6:17 PM, Evgeniy Shishkin <[hidden email]>
> wrote:
>>
>> We have this problem too.
>>
>> Roshan, did you find any workaround?
>>
>> On 03 Jan 2014, at 10:35, Roshan Nair <[hidden email]> wrote:
>>
>> > Hi,
>> >
>> > I have for testing, a standalone two node spark cluster with
>> > spark-0.8.1-incubating-bin-cdh4. I read and write to my hdfs cluster
>> > (cdh-4.2).
>> >
>> > I have a job that I run from the spark-shell. I always find this error
>> > during the first reduceByKey stage. Full stack trace is at the end of this
>> > email
>> >
>> > java.lang.NoSuchMethodError (java.lang.NoSuchMethodError:
>> > org.apache.commons.io.IOUtils.closeQuietly(Ljava/io/Closeable;)V)
>> >
>> >
>> > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:986)
>> >
>> > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)
>> >
>> > Now the strange thing is the task fails a few times on workers on both
>> > nodes, but eventually succeeds.
>> >
>> > I've double-checked several times that my application jar does not
>> > contain hadoop libraries or apache commons io (especially DFSInputStream and
>> > IOUtils).
>> >
>> > The worker(on both nodes) and driver classpaths contain only my
>> > application jar, spark jars and conf, and the hadoop conf directory. I
>> > verified this from the worker process and also from the spark-shell ui
>> > environment tab:
>> >
>> > /xxx/hadoop-mr/conf   System Classpath
>> >
>> > /xxx/spark/spark-0.8.1-incubating-bin-cdh4/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar
>> > System Classpath
>> > /xxx/spark/spark-0.8.1-incubating-bin-cdh4/conf       System Classpath
>> >
>> > /xxx/spark/spark-0.8.1-incubating-bin-cdh4/tools/target/spark-tools_2.9.3-0.8.1-incubating.jar
>> > System Classpath
>> > http://192.168.1.1:43557/jars/xyz-1.0-SNAPSHOT-jar-with-dependencies.jar
>> > Added By User
>> >
>> > There is only one org.apache.commons.io.IOUtils in the classpath (in
>> > spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar) and it
>> > appears to contain the closeQuietly method.
>> >
>> > The entire stack trace from spark shell ui:
>> >
>> > java.lang.NoSuchMethodError (java.lang.NoSuchMethodError:
>> > org.apache.commons.io.IOUtils.closeQuietly(Ljava/io/Closeable;)V)
>> >
>> >
>> > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:986)
>> >
>> > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)
>> >
>> > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:662)
>> > org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:706)
>> > java.io.DataInputStream.read(DataInputStream.java:100)
>> > org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209)
>> > org.apache.hadoop.util.LineReader.readLine(LineReader.java:173)
>> >
>> > org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:160)
>> > org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:38)
>> > org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:167)
>> > org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:150)
>> > org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>> >
>> > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
>> > scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
>> > scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
>> > scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
>> > scala.collection.Iterator$class.foreach(Iterator.scala:772)
>> > scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)
>> > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>> >
>> > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:102)
>> > org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:75)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:224)
>> > org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:32)
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> > org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> > org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:32)
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> > org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> >
>> > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:159)
>> >
>> > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:100)
>> > org.apache.spark.scheduler.Task.run(Task.scala:53)
>> >
>> > The task succeeds after a few failed attempts, but, I'm stumped at this
>> > point as to why this happens.
>> >
>> > Any help appreciated.
>> >
>> > Roshan
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly with cdh4 binary

Harry Brundage
This post has NOT been accepted by the mailing list yet.
Roshan, did you manage to figure this out? I'm suffering from the same thing. We're on Hadoop 2.0.0-cdh4.4.0.

Sean, can I provide you any more information to give a bit more insight?
Reply | Threaded
Open this post in threaded view
|

Re: NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly with cdh4 binary

Roshan Nair
In reply to this post by Roshan Nair
Hi,

Just an update, I ran my application with -verbose:class enabled and confirmed that this class is loaded exactly once. So, I'm still not sure what the problem is. I'm just going to build spark with my version of cdh instead of using the pre-built one and hope this solves the problem.

Roshan


On Fri, Jan 10, 2014 at 11:33 PM, Roshan Nair <[hidden email]> wrote:

Hi Sean,

My code is compiled against the spark cdh 4.2 bundle. My jar doesn't bundle any hadoop or spark libraries. I've confirmed my jar doesn't have the commons utils class.

When I run spark the classpath only contains spark tools jar, spark cdh 4.2 jar, spark conf and the hadoop conf directories. This is both for the spark worker daemon and the job executor itself. Since it doesn't run within hadoop(ie. Its not run as a hadoop job), I don't expect to inherit the hadoop classpath.

Spark uses hadoop client(rpc) to communicate with hdfs, so I don't see how this is the case here.

Or am I missing something?

Roshan

On Jan 10, 2014 9:17 PM, "Sean Owen" <[hidden email]> wrote:
Problems of this form are usually because your code depends on one
version of a library, but Hadoop itself contains a different version
of the library. You successfully compile against, and bundle, your own
version.

But when run in Hadoop, since Java classloaders always delegate to the
parent first, it will end up finding a class in the library from
Hadoop's different copy.

I have had this issue with Guava in the past for example. Hadoop most
certainly depends on Commons IO. (Try mvn dependency:tree to see
everything)

The only mysterious thing is why it fails sometimes but not other times.
Any chance you have differing versions of Hadoop installed on the
nodes? maybe multiple versions, with classpaths cross-wired. Then I
could imagine that it fails when assigned to some nodes but not
others.


On Fri, Jan 10, 2014 at 2:00 PM, Roshan Nair <[hidden email]> wrote:
> Hi Evgeniy,
>
> Nope. I haven't spent much time on it since. I suspected I might be
> including other jars/directories in the path, but that is definitely not the
> case.
>
> On one of my workers, this is my worker daemon classpath:
> /apps/spark/spark-0.8.1-incubating-bin-cdh4/tools/target/spark-tools_2.9.3-0.8.1-incubating.jar::/apps/spark/spark-0.8.1-incubating-bin-cdh4/conf:/apps/spark/spark-0.8.1-incubating-bin-cdh4/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar:/hadoop-mr/conf
> -Dspark.akka.logLifecycleEvents=true
> -Djava.library.path=/usr/lib64:/apps/hadoop-mr/lib/native/Linux-amd64-64
>
> And this is the executor process classpath on the same worker:
> /apps/spark/spark-0.8.1-incubating-bin-cdh4/conf:/apps/spark/spark-0.8.1-incubating-bin-cdh4/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar:/hadoop-mr/conf
>
> I'll get around to logging some diagnostics on the class and classloader
> soon. I'll keep you updated.
>
> This is not a show-stopper at the moment, since the tasks recover
> eventually, but of course, I want to fix it.
>
> Roshan
>
>
>
> On Fri, Jan 10, 2014 at 6:17 PM, Evgeniy Shishkin <[hidden email]>
> wrote:
>>
>> We have this problem too.
>>
>> Roshan, did you find any workaround?
>>
>> On 03 Jan 2014, at 10:35, Roshan Nair <[hidden email]> wrote:
>>
>> > Hi,
>> >
>> > I have for testing, a standalone two node spark cluster with
>> > spark-0.8.1-incubating-bin-cdh4. I read and write to my hdfs cluster
>> > (cdh-4.2).
>> >
>> > I have a job that I run from the spark-shell. I always find this error
>> > during the first reduceByKey stage. Full stack trace is at the end of this
>> > email
>> >
>> > java.lang.NoSuchMethodError (java.lang.NoSuchMethodError:
>> > org.apache.commons.io.IOUtils.closeQuietly(Ljava/io/Closeable;)V)
>> >
>> >
>> > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:986)
>> >
>> > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)
>> >
>> > Now the strange thing is the task fails a few times on workers on both
>> > nodes, but eventually succeeds.
>> >
>> > I've double-checked several times that my application jar does not
>> > contain hadoop libraries or apache commons io (especially DFSInputStream and
>> > IOUtils).
>> >
>> > The worker(on both nodes) and driver classpaths contain only my
>> > application jar, spark jars and conf, and the hadoop conf directory. I
>> > verified this from the worker process and also from the spark-shell ui
>> > environment tab:
>> >
>> > /xxx/hadoop-mr/conf   System Classpath
>> >
>> > /xxx/spark/spark-0.8.1-incubating-bin-cdh4/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar
>> > System Classpath
>> > /xxx/spark/spark-0.8.1-incubating-bin-cdh4/conf       System Classpath
>> >
>> > /xxx/spark/spark-0.8.1-incubating-bin-cdh4/tools/target/spark-tools_2.9.3-0.8.1-incubating.jar
>> > System Classpath
>> > http://192.168.1.1:43557/jars/xyz-1.0-SNAPSHOT-jar-with-dependencies.jar
>> > Added By User
>> >
>> > There is only one org.apache.commons.io.IOUtils in the classpath (in
>> > spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar) and it
>> > appears to contain the closeQuietly method.
>> >
>> > The entire stack trace from spark shell ui:
>> >
>> > java.lang.NoSuchMethodError (java.lang.NoSuchMethodError:
>> > org.apache.commons.io.IOUtils.closeQuietly(Ljava/io/Closeable;)V)
>> >
>> >
>> > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:986)
>> >
>> > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:471)
>> >
>> > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:662)
>> > org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:706)
>> > java.io.DataInputStream.read(DataInputStream.java:100)
>> > org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209)
>> > org.apache.hadoop.util.LineReader.readLine(LineReader.java:173)
>> >
>> > org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:160)
>> > org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:38)
>> > org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:167)
>> > org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:150)
>> > org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>> >
>> > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
>> > scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
>> > scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
>> > scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
>> > scala.collection.Iterator$class.foreach(Iterator.scala:772)
>> > scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)
>> > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>> >
>> > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:102)
>> > org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:75)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:224)
>> > org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:32)
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> > org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> > org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:32)
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> > org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> >
>> > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:159)
>> >
>> > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:100)
>> > org.apache.spark.scheduler.Task.run(Task.scala:53)
>> >
>> > The task succeeds after a few failed attempts, but, I'm stumped at this
>> > point as to why this happens.
>> >
>> > Any help appreciated.
>> >
>> > Roshan
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly with cdh4 binary

kamatsuoka
The version of commons-io included in the Spark assembly is an old one, which doesn't have the version of closeQuietly that takes a Closeable:

$ javap -cp /root/spark/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar org.apache.commons.io.IOUtils
Compiled from "IOUtils.java"
public class org.apache.commons.io.IOUtils {
...
  public org.apache.commons.io.IOUtils();
  public static void closeQuietly(java.io.Reader);
  public static void closeQuietly(java.io.Writer);
  public static void closeQuietly(java.io.InputStream);
  public static void closeQuietly(java.io.OutputStream);
  public static byte[] toByteArray(java.io.InputStream) throws java.io.IOException;
...

It looks to me like org.apache.hadoop.hdfs.DFSInputStream depends on commons-io 2.4, while spark 0.8.1 depends on commons-io 2.1.

Luckily spark 0.9 depends on commons-io 2.4, so the next release should fix this issue.
Reply | Threaded
Open this post in threaded view
|

Re: NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly with cdh4 binary

kamatsuoka
I verified that this problem doesn't happen under spark 0.9.0.
Reply | Threaded
Open this post in threaded view
|

Re: NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly with cdh4 binary

Roshan Nair
kamatsuoka,

Thanks. I thought I had checked IOUtils bundled with spark, but, apparently, I missed that it didn't have a closeQuietly(Closeable).

We also figured out why this doesn't happen when the task is reattempted. With debug logging enabled. we saw the following line printed just before the closeQuietly exception.

14/02/15 10:51:27 DEBUG DFSClient: Error making BlockReader. Closing stale NioInetPeer(Socket[addr=/10.0.2.224,port=50010,localport=42418])
java.io.EOFException: Premature EOF: no length prefix available
        at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)
        at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:383)
        at org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:136)
        at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:993)

IOUtils.closeQuietly is called in the catch block handling the EOFException. I'm not quite sure how DFSClient works, but it appears to expect to fail sometimes while building a block reader and so it makes multiple attempts, which is why we don't see task reattempts failing.

We're moving to spark-0.9.0 as well.

Roshan



On Sat, Feb 15, 2014 at 6:34 AM, kamatsuoka <[hidden email]> wrote:
I verified that this problem doesn't happen under spark 0.9.0.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NoSuchMethodError-org-apache-commons-io-IOUtils-closeQuietly-with-cdh4-binary-tp204p1541.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.