ExternalAppendOnlyMap throw no such element

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

ExternalAppendOnlyMap throw no such element

guojc
Hi,
  I'm tring out lastest master branch of spark for the exciting external hashmap feature. I have a code that is running correctly at spark 0.8.1 and I only make a change for its easily to be spilled to disk. However, I encounter a few task failure of 
java.util.NoSuchElementException (java.util.NoSuchElementException)
org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:277)org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:212)org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29)
And the job seems to fail to recover.
Can anyone give some suggestion on how to investigate the issue?
Thanks,Jiacheng Guo
Reply | Threaded
Open this post in threaded view
|

Re: ExternalAppendOnlyMap throw no such element

Patrick Wendell
This code has been modified since you reported this so you may want to
try the current master.

- Patrick

On Mon, Jan 20, 2014 at 4:22 AM, guojc <[hidden email]> wrote:

> Hi,
>   I'm tring out lastest master branch of spark for the exciting external
> hashmap feature. I have a code that is running correctly at spark 0.8.1 and
> I only make a change for its easily to be spilled to disk. However, I
> encounter a few task failure of
> java.util.NoSuchElementException (java.util.NoSuchElementException)
> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:277)org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:212)org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29)
> And the job seems to fail to recover.
> Can anyone give some suggestion on how to investigate the issue?
> Thanks,Jiacheng Guo
Reply | Threaded
Open this post in threaded view
|

Re: ExternalAppendOnlyMap throw no such element

guojc
Hi Patrick,
    I still get the exception on lastest master 05be7047744c88e64e7e6bd973f9bcfacd00da5f. A bit more info on the subject. I'm using KryoSerialzation with a custom serialization function, and the exception come from a rdd operation combineByKey(createDict,combineKey,mergeDict,partitioner,true,"org.apache.spark.serializer.KryoSerializer"). All previous operation seems ok. The only difference is that this operation can generate some a large dict object around 1 gb size.  I hope this can give you some clue what might go wrong.  I'm still having trouble figure out the cause.

Thanks,
Jiacheng Guo


On Wed, Jan 22, 2014 at 1:36 PM, Patrick Wendell <[hidden email]> wrote:
This code has been modified since you reported this so you may want to
try the current master.

- Patrick

On Mon, Jan 20, 2014 at 4:22 AM, guojc <[hidden email]> wrote:
> Hi,
>   I'm tring out lastest master branch of spark for the exciting external
> hashmap feature. I have a code that is running correctly at spark 0.8.1 and
> I only make a change for its easily to be spilled to disk. However, I
> encounter a few task failure of
> java.util.NoSuchElementException (java.util.NoSuchElementException)
> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:277)org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:212)org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29)
> And the job seems to fail to recover.
> Can anyone give some suggestion on how to investigate the issue?
> Thanks,Jiacheng Guo

Reply | Threaded
Open this post in threaded view
|

Re: ExternalAppendOnlyMap throw no such element

Patrick Wendell
Hey There,

So one thing you can do is disable the external sorting, this should
preserve the behavior exactly was it was in previous releases.

It's quite possible that the problem you are having relates to the
fact that you have individual records that are 1GB in size. This is a
pretty extreme case that may violate assumptions in the implementation
of the external aggregation code.

Would you mind opening a Jira for this? Also, if you are able to find
an isolated way to recreate the behavior it will make it easier to
debug and fix.

IIRC, even with external aggregation Spark still materializes the
final combined output *for a given key* in memory. If you are
outputting GB of data for a single key, then you might also look into
a different parallelization strategy for your algorithm. Not sure if
this is also an issue though...

- Patrick

On Sun, Jan 26, 2014 at 2:27 AM, guojc <[hidden email]> wrote:

> Hi Patrick,
>     I still get the exception on lastest master
> 05be7047744c88e64e7e6bd973f9bcfacd00da5f. A bit more info on the subject.
> I'm using KryoSerialzation with a custom serialization function, and the
> exception come from a rdd operation
> combineByKey(createDict,combineKey,mergeDict,partitioner,true,"org.apache.spark.serializer.KryoSerializer").
> All previous operation seems ok. The only difference is that this operation
> can generate some a large dict object around 1 gb size.  I hope this can
> give you some clue what might go wrong.  I'm still having trouble figure out
> the cause.
>
> Thanks,
> Jiacheng Guo
>
>
> On Wed, Jan 22, 2014 at 1:36 PM, Patrick Wendell <[hidden email]> wrote:
>>
>> This code has been modified since you reported this so you may want to
>> try the current master.
>>
>> - Patrick
>>
>> On Mon, Jan 20, 2014 at 4:22 AM, guojc <[hidden email]> wrote:
>> > Hi,
>> >   I'm tring out lastest master branch of spark for the exciting external
>> > hashmap feature. I have a code that is running correctly at spark 0.8.1
>> > and
>> > I only make a change for its easily to be spilled to disk. However, I
>> > encounter a few task failure of
>> > java.util.NoSuchElementException (java.util.NoSuchElementException)
>> >
>> > org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:277)org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:212)org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29)
>> > And the job seems to fail to recover.
>> > Can anyone give some suggestion on how to investigate the issue?
>> > Thanks,Jiacheng Guo
>
>
Reply | Threaded
Open this post in threaded view
|

Re: ExternalAppendOnlyMap throw no such element

guojc
Hi Patrick,
   I think this might be data related and about edge condition handling as I only get a single partition repeatedly throw exception on externalappendonlymap's iterator.  I will file a jira as soon as I can isolate the problem. Btw, the test is intentionally abuse the external sort to see its performance impact on real application, because I have trouble to configure a right partition number for each dataset.

Best Regards,
Jiacheng Guo


On Mon, Jan 27, 2014 at 6:16 AM, Patrick Wendell <[hidden email]> wrote:
Hey There,

So one thing you can do is disable the external sorting, this should
preserve the behavior exactly was it was in previous releases.

It's quite possible that the problem you are having relates to the
fact that you have individual records that are 1GB in size. This is a
pretty extreme case that may violate assumptions in the implementation
of the external aggregation code.

Would you mind opening a Jira for this? Also, if you are able to find
an isolated way to recreate the behavior it will make it easier to
debug and fix.

IIRC, even with external aggregation Spark still materializes the
final combined output *for a given key* in memory. If you are
outputting GB of data for a single key, then you might also look into
a different parallelization strategy for your algorithm. Not sure if
this is also an issue though...

- Patrick

On Sun, Jan 26, 2014 at 2:27 AM, guojc <[hidden email]> wrote:
> Hi Patrick,
>     I still get the exception on lastest master
> 05be7047744c88e64e7e6bd973f9bcfacd00da5f. A bit more info on the subject.
> I'm using KryoSerialzation with a custom serialization function, and the
> exception come from a rdd operation
> combineByKey(createDict,combineKey,mergeDict,partitioner,true,"org.apache.spark.serializer.KryoSerializer").
> All previous operation seems ok. The only difference is that this operation
> can generate some a large dict object around 1 gb size.  I hope this can
> give you some clue what might go wrong.  I'm still having trouble figure out
> the cause.
>
> Thanks,
> Jiacheng Guo
>
>
> On Wed, Jan 22, 2014 at 1:36 PM, Patrick Wendell <[hidden email]> wrote:
>>
>> This code has been modified since you reported this so you may want to
>> try the current master.
>>
>> - Patrick
>>
>> On Mon, Jan 20, 2014 at 4:22 AM, guojc <[hidden email]> wrote:
>> > Hi,
>> >   I'm tring out lastest master branch of spark for the exciting external
>> > hashmap feature. I have a code that is running correctly at spark 0.8.1
>> > and
>> > I only make a change for its easily to be spilled to disk. However, I
>> > encounter a few task failure of
>> > java.util.NoSuchElementException (java.util.NoSuchElementException)
>> >
>> > org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:277)org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:212)org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29)
>> > And the job seems to fail to recover.
>> > Can anyone give some suggestion on how to investigate the issue?
>> > Thanks,Jiacheng Guo
>
>

Reply | Threaded
Open this post in threaded view
|

Re: ExternalAppendOnlyMap throw no such element

guojc
Hi Patrick,
    I have create the jira https://spark-project.atlassian.net/browse/SPARK-1045. It turn out the situation is related to join two large rdd, not related to the combine process as previous thought.

Best Regards,
Jiacheng Guo


On Mon, Jan 27, 2014 at 11:07 AM, guojc <[hidden email]> wrote:
Hi Patrick,
   I think this might be data related and about edge condition handling as I only get a single partition repeatedly throw exception on externalappendonlymap's iterator.  I will file a jira as soon as I can isolate the problem. Btw, the test is intentionally abuse the external sort to see its performance impact on real application, because I have trouble to configure a right partition number for each dataset.

Best Regards,
Jiacheng Guo


On Mon, Jan 27, 2014 at 6:16 AM, Patrick Wendell <[hidden email]> wrote:
Hey There,

So one thing you can do is disable the external sorting, this should
preserve the behavior exactly was it was in previous releases.

It's quite possible that the problem you are having relates to the
fact that you have individual records that are 1GB in size. This is a
pretty extreme case that may violate assumptions in the implementation
of the external aggregation code.

Would you mind opening a Jira for this? Also, if you are able to find
an isolated way to recreate the behavior it will make it easier to
debug and fix.

IIRC, even with external aggregation Spark still materializes the
final combined output *for a given key* in memory. If you are
outputting GB of data for a single key, then you might also look into
a different parallelization strategy for your algorithm. Not sure if
this is also an issue though...

- Patrick

On Sun, Jan 26, 2014 at 2:27 AM, guojc <[hidden email]> wrote:
> Hi Patrick,
>     I still get the exception on lastest master
> 05be7047744c88e64e7e6bd973f9bcfacd00da5f. A bit more info on the subject.
> I'm using KryoSerialzation with a custom serialization function, and the
> exception come from a rdd operation
> combineByKey(createDict,combineKey,mergeDict,partitioner,true,"org.apache.spark.serializer.KryoSerializer").
> All previous operation seems ok. The only difference is that this operation
> can generate some a large dict object around 1 gb size.  I hope this can
> give you some clue what might go wrong.  I'm still having trouble figure out
> the cause.
>
> Thanks,
> Jiacheng Guo
>
>
> On Wed, Jan 22, 2014 at 1:36 PM, Patrick Wendell <[hidden email]> wrote:
>>
>> This code has been modified since you reported this so you may want to
>> try the current master.
>>
>> - Patrick
>>
>> On Mon, Jan 20, 2014 at 4:22 AM, guojc <[hidden email]> wrote:
>> > Hi,
>> >   I'm tring out lastest master branch of spark for the exciting external
>> > hashmap feature. I have a code that is running correctly at spark 0.8.1
>> > and
>> > I only make a change for its easily to be spilled to disk. However, I
>> > encounter a few task failure of
>> > java.util.NoSuchElementException (java.util.NoSuchElementException)
>> >
>> > org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:277)org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:212)org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29)
>> > And the job seems to fail to recover.
>> > Can anyone give some suggestion on how to investigate the issue?
>> > Thanks,Jiacheng Guo
>
>


Reply | Threaded
Open this post in threaded view
|

Re: ExternalAppendOnlyMap throw no such element

guojc
Hi there,
    I was able to finally identify the bug as StreamBuffer.compareTo method's ill defined behavior when key's hashCode equals to Int.MaxValue. Though this only occur in aboue 1/2^32 chance, it can happen a lot when your key size approach 2^32. I have create a pull request for the bug fix https://github.com/apache/incubator-spark/pull/612

Best Regards,
Jiacheng Guo


On Mon, Jan 27, 2014 at 2:36 PM, guojc <[hidden email]> wrote:
Hi Patrick,
    I have create the jira https://spark-project.atlassian.net/browse/SPARK-1045. It turn out the situation is related to join two large rdd, not related to the combine process as previous thought.

Best Regards,
Jiacheng Guo


On Mon, Jan 27, 2014 at 11:07 AM, guojc <[hidden email]> wrote:
Hi Patrick,
   I think this might be data related and about edge condition handling as I only get a single partition repeatedly throw exception on externalappendonlymap's iterator.  I will file a jira as soon as I can isolate the problem. Btw, the test is intentionally abuse the external sort to see its performance impact on real application, because I have trouble to configure a right partition number for each dataset.

Best Regards,
Jiacheng Guo


On Mon, Jan 27, 2014 at 6:16 AM, Patrick Wendell <[hidden email]> wrote:
Hey There,

So one thing you can do is disable the external sorting, this should
preserve the behavior exactly was it was in previous releases.

It's quite possible that the problem you are having relates to the
fact that you have individual records that are 1GB in size. This is a
pretty extreme case that may violate assumptions in the implementation
of the external aggregation code.

Would you mind opening a Jira for this? Also, if you are able to find
an isolated way to recreate the behavior it will make it easier to
debug and fix.

IIRC, even with external aggregation Spark still materializes the
final combined output *for a given key* in memory. If you are
outputting GB of data for a single key, then you might also look into
a different parallelization strategy for your algorithm. Not sure if
this is also an issue though...

- Patrick

On Sun, Jan 26, 2014 at 2:27 AM, guojc <[hidden email]> wrote:
> Hi Patrick,
>     I still get the exception on lastest master
> 05be7047744c88e64e7e6bd973f9bcfacd00da5f. A bit more info on the subject.
> I'm using KryoSerialzation with a custom serialization function, and the
> exception come from a rdd operation
> combineByKey(createDict,combineKey,mergeDict,partitioner,true,"org.apache.spark.serializer.KryoSerializer").
> All previous operation seems ok. The only difference is that this operation
> can generate some a large dict object around 1 gb size.  I hope this can
> give you some clue what might go wrong.  I'm still having trouble figure out
> the cause.
>
> Thanks,
> Jiacheng Guo
>
>
> On Wed, Jan 22, 2014 at 1:36 PM, Patrick Wendell <[hidden email]> wrote:
>>
>> This code has been modified since you reported this so you may want to
>> try the current master.
>>
>> - Patrick
>>
>> On Mon, Jan 20, 2014 at 4:22 AM, guojc <[hidden email]> wrote:
>> > Hi,
>> >   I'm tring out lastest master branch of spark for the exciting external
>> > hashmap feature. I have a code that is running correctly at spark 0.8.1
>> > and
>> > I only make a change for its easily to be spilled to disk. However, I
>> > encounter a few task failure of
>> > java.util.NoSuchElementException (java.util.NoSuchElementException)
>> >
>> > org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:277)org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:212)org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29)
>> > And the job seems to fail to recover.
>> > Can anyone give some suggestion on how to investigate the issue?
>> > Thanks,Jiacheng Guo
>
>