1 day window size

classic Classic list List threaded Threaded
2 messages Options
cem
Reply | Threaded
Open this post in threaded view
|

1 day window size

cem
Hi all,

I am new to the spark streaming and trying to evaluate it and I have couple of questions.

1. Can setting window sand slide duration to 1 day cause any  problem? My data size that will  fall to that interval is small.   Do you have other suggestions ? 

2. What is the best way to detect correlation? Suppose that I have 2 different events from the same source. I want to do an action when these 2 events happen in the same day. I thought about having a reducer.

Thanks in advance!

Best Regards,
Cem
Reply | Threaded
Open this post in threaded view
|

Re: 1 day window size

Tathagata Das
1. I dont think we have tested window sizes that long.

2. If you have to keep track of a days worth of data, it may be better to use an external systems that are more dedicated for lookups over massive amounts of data (say, Cassandra). Use some unique key to push all the data to Cassandra and then every every records, you can use Spark Streaming to look up cassandra and see if it already exists or not. That can work.

TD



On Mon, Feb 17, 2014 at 8:48 AM, cem <[hidden email]> wrote:
Hi all,

I am new to the spark streaming and trying to evaluate it and I have couple of questions.

1. Can setting window sand slide duration to 1 day cause any  problem? My data size that will  fall to that interval is small.   Do you have other suggestions ? 

2. What is the best way to detect correlation? Suppose that I have 2 different events from the same source. I want to do an action when these 2 events happen in the same day. I thought about having a reducer.

Thanks in advance!

Best Regards,
Cem