java.lang.IndexOutOfBoundsException: len is negative - when data size increases

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

java.lang.IndexOutOfBoundsException: len is negative - when data size increases

Deepak Sharma
Hi All,
I am running spark based ETL in spark 1.6  and facing this weird issue.
The same code with same properties/configuration runs fine in other environment E.g. PROD but never completes in CAT.
The only change would be the size of data it is processing and that too be by 1-2 GB.
This is the stack trace:java.lang.IndexOutOfBoundsException: len is negative
        at org.spark-project.guava.io.ByteStreams.read(ByteStreams.java:895)
        at org.spark-project.guava.io.ByteStreams.readFully(ByteStreams.java:733)
        at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.loadNext(UnsafeSorterSpillReader.java:76)
        at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter$SpillableIterator.loadNext(UnsafeExternalSorter.java:509)
        at org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:136)
        at org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:123)
        at org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:84)
        at org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedStreamed(SortMergeJoin.scala:272)
        at org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextOuterJoinRows(SortMergeJoin.scala:233)
        at org.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceStream(SortMergeOuterJoin.scala:250)
        at org.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceNext(SortMergeOuterJoin.scala:283)
        at org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
        at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)

Did anyone faced this issue?
If yes , what can i do to resolve this?

Thanks
Deepak
Reply | Threaded
Open this post in threaded view
|

Re: java.lang.IndexOutOfBoundsException: len is negative - when data size increases

Vadim Semenov-2
one of the spills becomes bigger than 2GiB and can't be loaded fully
(as arrays in Java can't have more than 2^32 values)

>     org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.loadNext(UnsafeSorterSpillReader.java:76)


You can try increasing the number of partitions, so spills would be
further smaller.

Also check if you have some skewness on the stage that precedes the
stage where it fails on

On Thu, Aug 16, 2018 at 11:25 AM Deepak Sharma <[hidden email]> wrote:

>
> Hi All,
> I am running spark based ETL in spark 1.6  and facing this weird issue.
> The same code with same properties/configuration runs fine in other environment E.g. PROD but never completes in CAT.
> The only change would be the size of data it is processing and that too be by 1-2 GB.
> This is the stack trace:java.lang.IndexOutOfBoundsException: len is negative
>         at org.spark-project.guava.io.ByteStreams.read(ByteStreams.java:895)
>         at org.spark-project.guava.io.ByteStreams.readFully(ByteStreams.java:733)
>         at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.loadNext(UnsafeSorterSpillReader.java:76)
>         at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter$SpillableIterator.loadNext(UnsafeExternalSorter.java:509)
>         at org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:136)
>         at org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:123)
>         at org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:84)
>         at org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedStreamed(SortMergeJoin.scala:272)
>         at org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextOuterJoinRows(SortMergeJoin.scala:233)
>         at org.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceStream(SortMergeOuterJoin.scala:250)
>         at org.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceNext(SortMergeOuterJoin.scala:283)
>         at org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:748)
>
> Did anyone faced this issue?
> If yes , what can i do to resolve this?
>
> Thanks
> Deepak



--
Sent from my iPhone

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]