Creating DStream windows that start at specific times

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Creating DStream windows that start at specific times

Chris Regnier
Hey everyone,

I've been contemplating an upcoming issue for my current project that I
can't see an obvious solution for, so I was hoping someone else could
point me in the right direction. I'm trying to window up live twitter
data over a couple of different batch periods (1 hour, 4 hours, and 24
hours). But the windowed periods need to be started at specific times,
ie. human readable times (for example right on the hour), and not in the
middle of a 1 hour period. Also, whenever the system is restarted the
first periods should all be partial periods so that data isn't ignored.

So is there a good way to create a DStream window that will create jobs
around specific time intervals, and can create a job for the initial
time interval as well? Something like dstream.window(windowDuration,
slideDuration, timeLeftInCurrentDuration, shouldIgnoreFirstPartialJob)?

Chris Regnier
Visualization Developer
Oculus Info Inc.