native-lzo library not available

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

native-lzo library not available

Fawze Abujaber
Hi Guys,

I'm running into issue where my spark jobs are failing on the below error, I'm using Spark 1.6.0 with CDH 5.13.0.

I tried to figure it out with no success.

Will appreciate any help or a direction how to attack this issue.

User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 3, xxxxxx, executor 1): java.lang.RuntimeException: native-lzo library not available
at com.hadoop.compression.lzo.LzoCodec.getDecompressorType(LzoCodec.java:193)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:181)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1995)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1881)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1830)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1844)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54)
at com.liveperson.dallas.lp.utils.incremental.DallasGenericTextFileRecordReader.initialize(DallasGenericTextFileRecordReader.java:64)
at com.liveperson.hadoop.fs.inputs.LPCombineFileRecordReaderWrapper.initialize(LPCombineFileRecordReaderWrapper.java:38)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initialize(CombineFileRecordReader.java:63)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:168)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:



I see the LZO at GPextras:

ll
total 104
-rw-r--r-- 1 cloudera-scm cloudera-scm 35308 Oct  4  2017 COPYING.hadoop-lzo
-rw-r--r-- 1 cloudera-scm cloudera-scm 62268 Oct  4  2017 hadoop-lzo-0.4.15-cdh5.13.0.jar
lrwxrwxrwx 1 cloudera-scm cloudera-scm    31 May  3 07:23 hadoop-lzo.jar -> hadoop-lzo-0.4.15-cdh5.13.0.jar
drwxr-xr-x 2 cloudera-scm cloudera-scm  4096 Oct  4  2017 native




--
Take Care
Fawze Abujaber
Reply | Threaded
Open this post in threaded view
|

Re: native-lzo library not available

yuliya Feldman
Jar is not enough, you need native library (*.so) - see if your "native" directory contains it

drwxr-xr-x 2 cloudera-scm cloudera-scm  4096 Oct  4  2017 native

and whether  java.library.path or LD_LIBRARY_PATH points/includes directory where your *.so library resides

On Thursday, May 3, 2018, 5:06:35 AM PDT, Fawze Abujaber <[hidden email]> wrote:


Hi Guys,

I'm running into issue where my spark jobs are failing on the below error, I'm using Spark 1.6.0 with CDH 5.13.0.

I tried to figure it out with no success.

Will appreciate any help or a direction how to attack this issue.

User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 3, xxxxxx, executor 1): java.lang.RuntimeException: native-lzo library not available
at com.hadoop.compression.lzo.LzoCodec.getDecompressorType(LzoCodec.java:193)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:181)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1995)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1881)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1830)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1844)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54)
at com.liveperson.dallas.lp.utils.incremental.DallasGenericTextFileRecordReader.initialize(DallasGenericTextFileRecordReader.java:64)
at com.liveperson.hadoop.fs.inputs.LPCombineFileRecordReaderWrapper.initialize(LPCombineFileRecordReaderWrapper.java:38)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initialize(CombineFileRecordReader.java:63)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:168)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:



I see the LZO at GPextras:

ll
total 104
-rw-r--r-- 1 cloudera-scm cloudera-scm 35308 Oct  4  2017 COPYING.hadoop-lzo
-rw-r--r-- 1 cloudera-scm cloudera-scm 62268 Oct  4  2017 hadoop-lzo-0.4.15-cdh5.13.0.jar
lrwxrwxrwx 1 cloudera-scm cloudera-scm    31 May  3 07:23 hadoop-lzo.jar -> hadoop-lzo-0.4.15-cdh5.13.0.jar
drwxr-xr-x 2 cloudera-scm cloudera-scm  4096 Oct  4  2017 native




--
Take Care
Fawze Abujaber
Reply | Threaded
Open this post in threaded view
|

Re: native-lzo library not available

Fawze Abujaber
Hi Yulia,

Thanks for you response.

i see only lzo only for impala

 [root@xxxxxxx ~]# locate *lzo*.so*
/opt/cloudera/parcels/GPLEXTRAS-5.13.0-1.cdh5.13.0.p0.29/lib/impala/lib/libimpalalzo.so
/usr/lib64/liblzo2.so.2
/usr/lib64/liblzo2.so.2.0.0

the /opt/cloudera/parcels/GPLEXTRAS-5.13.0-1.cdh5.13.0.p0.29/lib/hadoop/lib/native has :

-rwxr-xr-x 1 cloudera-scm cloudera-scm 22918 Oct  4  2017 libgplcompression.a
-rwxr-xr-x 1 cloudera-scm cloudera-scm  1204 Oct  4  2017 libgplcompression.la
-rwxr-xr-x 1 cloudera-scm cloudera-scm  1205 Oct  4  2017 libgplcompression.lai
-rwxr-xr-x 1 cloudera-scm cloudera-scm 15760 Oct  4  2017 libgplcompression.so
-rwxr-xr-x 1 cloudera-scm cloudera-scm 15768 Oct  4  2017 libgplcompression.so.0
-rwxr-xr-x 1 cloudera-scm cloudera-scm 15768 Oct  4  2017 libgplcompression.so.0.0.0


and /opt/cloudera/parcels/GPLEXTRAS-5.13.0-1.cdh5.13.0.p0.29/lib/spark-netlib/lib has:

-rw-r--r-- 1 cloudera-scm cloudera-scm    8673 Oct  4  2017 jniloader-1.1.jar
-rw-r--r-- 1 cloudera-scm cloudera-scm   53249 Oct  4  2017 native_ref-java-1.1.jar
-rw-r--r-- 1 cloudera-scm cloudera-scm   53295 Oct  4  2017 native_system-java-1.1.jar
-rw-r--r-- 1 cloudera-scm cloudera-scm 1732268 Oct  4  2017 netlib-native_ref-linux-x86_64-1.1-natives.jar
-rw-r--r-- 1 cloudera-scm cloudera-scm  446694 Oct  4  2017 netlib-native_system-linux-x86_64-1.1-natives.jar


Note: The issue occuring only with the spark job, mapreduce job working fine.

On Thu, May 3, 2018 at 9:17 PM, yuliya Feldman <[hidden email]> wrote:
Jar is not enough, you need native library (*.so) - see if your "native" directory contains it

drwxr-xr-x 2 cloudera-scm cloudera-scm  4096 Oct  4  2017 native

and whether  java.library.path or LD_LIBRARY_PATH points/includes directory where your *.so library resides

On Thursday, May 3, 2018, 5:06:35 AM PDT, Fawze Abujaber <[hidden email]> wrote:


Hi Guys,

I'm running into issue where my spark jobs are failing on the below error, I'm using Spark 1.6.0 with CDH 5.13.0.

I tried to figure it out with no success.

Will appreciate any help or a direction how to attack this issue.

User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 3, xxxxxx, executor 1): java.lang.RuntimeException: native-lzo library not available
at com.hadoop.compression.lzo.LzoCodec.getDecompressorType(LzoCodec.java:193)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:181)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1995)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1881)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1830)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1844)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54)
at com.liveperson.dallas.lp.utils.incremental.DallasGenericTextFileRecordReader.initialize(DallasGenericTextFileRecordReader.java:64)
at com.liveperson.hadoop.fs.inputs.LPCombineFileRecordReaderWrapper.initialize(LPCombineFileRecordReaderWrapper.java:38)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initialize(CombineFileRecordReader.java:63)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:168)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:



I see the LZO at GPextras:

ll
total 104
-rw-r--r-- 1 cloudera-scm cloudera-scm 35308 Oct  4  2017 COPYING.hadoop-lzo
-rw-r--r-- 1 cloudera-scm cloudera-scm 62268 Oct  4  2017 hadoop-lzo-0.4.15-cdh5.13.0.jar
lrwxrwxrwx 1 cloudera-scm cloudera-scm    31 May  3 07:23 hadoop-lzo.jar -> hadoop-lzo-0.4.15-cdh5.13.0.jar
drwxr-xr-x 2 cloudera-scm cloudera-scm  4096 Oct  4  2017 native




--
Take Care
Fawze Abujaber



--
Take Care
Fawze Abujaber
Reply | Threaded
Open this post in threaded view
|

Re: native-lzo library not available

ayan guha
This seems to be a Cloudera environment issue, and you might get a faster and more reliable answer in Cloudera forums. 

On Fri, May 4, 2018 at 3:39 PM, Fawze Abujaber <[hidden email]> wrote:
Hi Yulia,

Thanks for you response.

i see only lzo only for impala

 [root@xxxxxxx ~]# locate *lzo*.so*
/opt/cloudera/parcels/GPLEXTRAS-5.13.0-1.cdh5.13.0.p0.29/lib/impala/lib/libimpalalzo.so
/usr/lib64/liblzo2.so.2
/usr/lib64/liblzo2.so.2.0.0

the /opt/cloudera/parcels/GPLEXTRAS-5.13.0-1.cdh5.13.0.p0.29/lib/hadoop/lib/native has :

-rwxr-xr-x 1 cloudera-scm cloudera-scm 22918 Oct  4  2017 libgplcompression.a
-rwxr-xr-x 1 cloudera-scm cloudera-scm  1204 Oct  4  2017 libgplcompression.la
-rwxr-xr-x 1 cloudera-scm cloudera-scm  1205 Oct  4  2017 libgplcompression.lai
-rwxr-xr-x 1 cloudera-scm cloudera-scm 15760 Oct  4  2017 libgplcompression.so
-rwxr-xr-x 1 cloudera-scm cloudera-scm 15768 Oct  4  2017 libgplcompression.so.0
-rwxr-xr-x 1 cloudera-scm cloudera-scm 15768 Oct  4  2017 libgplcompression.so.0.0.0


and /opt/cloudera/parcels/GPLEXTRAS-5.13.0-1.cdh5.13.0.p0.29/lib/spark-netlib/lib has:

-rw-r--r-- 1 cloudera-scm cloudera-scm    8673 Oct  4  2017 jniloader-1.1.jar
-rw-r--r-- 1 cloudera-scm cloudera-scm   53249 Oct  4  2017 native_ref-java-1.1.jar
-rw-r--r-- 1 cloudera-scm cloudera-scm   53295 Oct  4  2017 native_system-java-1.1.jar
-rw-r--r-- 1 cloudera-scm cloudera-scm 1732268 Oct  4  2017 netlib-native_ref-linux-x86_64-1.1-natives.jar
-rw-r--r-- 1 cloudera-scm cloudera-scm  446694 Oct  4  2017 netlib-native_system-linux-x86_64-1.1-natives.jar


Note: The issue occuring only with the spark job, mapreduce job working fine.

On Thu, May 3, 2018 at 9:17 PM, yuliya Feldman <[hidden email]> wrote:
Jar is not enough, you need native library (*.so) - see if your "native" directory contains it

drwxr-xr-x 2 cloudera-scm cloudera-scm  4096 Oct  4  2017 native

and whether  java.library.path or LD_LIBRARY_PATH points/includes directory where your *.so library resides

On Thursday, May 3, 2018, 5:06:35 AM PDT, Fawze Abujaber <[hidden email]> wrote:


Hi Guys,

I'm running into issue where my spark jobs are failing on the below error, I'm using Spark 1.6.0 with CDH 5.13.0.

I tried to figure it out with no success.

Will appreciate any help or a direction how to attack this issue.

User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 3, xxxxxx, executor 1): java.lang.RuntimeException: native-lzo library not available
at com.hadoop.compression.lzo.LzoCodec.getDecompressorType(LzoCodec.java:193)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:181)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1995)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1881)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1830)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1844)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54)
at com.liveperson.dallas.lp.utils.incremental.DallasGenericTextFileRecordReader.initialize(DallasGenericTextFileRecordReader.java:64)
at com.liveperson.hadoop.fs.inputs.LPCombineFileRecordReaderWrapper.initialize(LPCombineFileRecordReaderWrapper.java:38)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initialize(CombineFileRecordReader.java:63)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:168)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:



I see the LZO at GPextras:

ll
total 104
-rw-r--r-- 1 cloudera-scm cloudera-scm 35308 Oct  4  2017 COPYING.hadoop-lzo
-rw-r--r-- 1 cloudera-scm cloudera-scm 62268 Oct  4  2017 hadoop-lzo-0.4.15-cdh5.13.0.jar
lrwxrwxrwx 1 cloudera-scm cloudera-scm    31 May  3 07:23 hadoop-lzo.jar -> hadoop-lzo-0.4.15-cdh5.13.0.jar
drwxr-xr-x 2 cloudera-scm cloudera-scm  4096 Oct  4  2017 native




--
Take Care
Fawze Abujaber



--
Take Care
Fawze Abujaber



--
Best Regards,
Ayan Guha