another updateStateByKey question

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

another updateStateByKey question

amoc

Has anyone else noticed that sometimes the same tuple calls update state function twice?

I have 2 tuples with the same key in 1 RDD part of DStream: RDD[ (a,1), (a,2) ]

When the update function is called the first time Seq[V] has data: 1, 2 which is correct: StateClass(3,2, ArrayBuffer(1, 2))

Then right away (in my output I see this) the same key is used and the function is called again but this time Seq is empty: StateClass(3,2, ArrayBuffer( ))

 

In the update function I also save Seq[V] to state so I can see it in the RDD. I also show a count and sum of the values.

StateClass(sum, count, Seq[V])

 

Why is the update function called with empty Seq[V] on the same key when all values for that key have been already taken care of in a previous update?

 

-Adrian

 

Reply | Threaded
Open this post in threaded view
|

Re: another updateStateByKey question

Tathagata Das

Could be a bug. Can you share a code with data that I can use to reproduce this?

TD

On May 2, 2014 9:49 AM, "Adrian Mocanu" <[hidden email]> wrote:

Has anyone else noticed that sometimes the same tuple calls update state function twice?

I have 2 tuples with the same key in 1 RDD part of DStream: RDD[ (a,1), (a,2) ]

When the update function is called the first time Seq[V] has data: 1, 2 which is correct: StateClass(3,2, ArrayBuffer(1, 2))

Then right away (in my output I see this) the same key is used and the function is called again but this time Seq is empty: StateClass(3,2, ArrayBuffer( ))

 

In the update function I also save Seq[V] to state so I can see it in the RDD. I also show a count and sum of the values.

StateClass(sum, count, Seq[V])

 

Why is the update function called with empty Seq[V] on the same key when all values for that key have been already taken care of in a previous update?

 

-Adrian

 

Reply | Threaded
Open this post in threaded view
|

RE: another updateStateByKey question

amoc

Unfortunately, I’ve been able to have this happen only once: the first time I ran my test. Consecutive tests never showed this again.

I will test some more and If it happens I will try to get more details.

 

Thanks!

-A

 

From: Tathagata Das [mailto:[hidden email]]
Sent: May-02-14 3:10 PM
To: [hidden email]
Cc: [hidden email]
Subject: Re: another updateStateByKey question

 

Could be a bug. Can you share a code with data that I can use to reproduce this?

TD

On May 2, 2014 9:49 AM, "Adrian Mocanu" <[hidden email]> wrote:

Has anyone else noticed that sometimes the same tuple calls update state function twice?

I have 2 tuples with the same key in 1 RDD part of DStream: RDD[ (a,1), (a,2) ]

When the update function is called the first time Seq[V] has data: 1, 2 which is correct: StateClass(3,2, ArrayBuffer(1, 2))

Then right away (in my output I see this) the same key is used and the function is called again but this time Seq is empty: StateClass(3,2, ArrayBuffer( ))

 

In the update function I also save Seq[V] to state so I can see it in the RDD. I also show a count and sum of the values.

StateClass(sum, count, Seq[V])

 

Why is the update function called with empty Seq[V] on the same key when all values for that key have been already taken care of in a previous update?

 

-Adrian