Interactive modification of DStreams

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Interactive modification of DStreams

lbustelo
This is a general question about whether Spark Streaming can be interactive like batch Spark jobs. I've read plenty of threads and done my fair bit of experimentation and I'm thinking the answer is NO, but it does not hurt to ask.

More specifically, I would like to be able to do:
1. Add/Remove steps to the Streaming Job
2. Modify Window durations
3. Stop and Restart context.

I've tried the following:

1. Modify the DStream after it has been started… BOOM! Exceptions everywhere.

2. Stop the DStream, Make modification, Start… NOT GOOD :( In 0.9.0 I was getting deadlocks. I also tried 1.0.0 and it did not work.

3. Based on information provided here, I was been able to prototype modifying the RDD computation within a forEachRDD. That is nice, but you are then bounded to the specified batch size. That got me to wanting to modify Window durations. Is changing the Window duration possible?

4. Tried running multiple streaming context from within a single Driver application and got several exceptions. The first one was bind exception on the web port. Then once the app started getting run (cores were taken but 1st job) it did not run correctly. A lot of "akka.pattern.AskTimeoutException: Timed out"
.

I've tried my experiments in 0.9.0, 0.9.1 and 1.0.0 running on Standalone Cluster setup.
Thanks in advanced
Reply | Threaded
Open this post in threaded view
|

Re: Interactive modification of DStreams

Tathagata Das
Currently Spark Streaming does not support addition/deletion/modification of DStream after the streaming context has been started. 
Nor can you restart a stopped streaming context. 
Also, multiple spark contexts (and therefore multiple streaming contexts) cannot be run concurrently in the same JVM. 

To change the window duration, I would one of the following.

1. Stop the previous streaming context, create a new streaming context, and setup the dstreams once again with the new window duration
2. Create a custom DStream, say DynamicWindowDStream. Take a look at how WindowedDStream is implemented (pretty simple, just a union over RDDs across time). That should allow you to modify the window duration. However, do make sure you have a maximum window duration that you will ever reach, and make sure you define parentRememberDuration as a "rememberDuration + maxWindowDuration". That fields defines which RDDs can be forgotten, so is sensitive to the window duration. Then you have to take care of correctly (atomically, etc.) modifying the window duration as per your requirements.

Happy streaming!

TD




On Mon, Jun 2, 2014 at 2:46 PM, lbustelo <[hidden email]> wrote:
This is a general question about whether Spark Streaming can be interactive
like batch Spark jobs. I've read plenty of threads and done my fair bit of
experimentation and I'm thinking the answer is NO, but it does not hurt to
ask.

More specifically, I would like to be able to do:
1. Add/Remove steps to the Streaming Job
2. Modify Window durations
3. Stop and Restart context.

I've tried the following:

1. Modify the DStream after it has been started… BOOM! Exceptions
everywhere.

2. Stop the DStream, Make modification, Start… NOT GOOD :( In 0.9.0 I was
getting deadlocks. I also tried 1.0.0 and it did not work.

3. Based on information provided  here
<http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-and-the-spark-shell-tp3347p3371.html>
, I was been able to prototype modifying the RDD computation within a
forEachRDD. That is nice, but you are then bounded to the specified batch
size. That got me to wanting to modify Window durations. Is changing the
Window duration possible?

4. Tried running multiple streaming context from within a single Driver
application and got several exceptions. The first one was bind exception on
the web port. Then once the app started getting run (cores were taken but
1st job) it did not run correctly. A lot of
"akka.pattern.AskTimeoutException: Timed out"
.

I've tried my experiments in 0.9.0, 0.9.1 and 1.0.0 running on Standalone
Cluster setup.
Thanks in advanced



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Interactive-modification-of-DStreams-tp6740.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Interactive modification of DStreams

Gino Bustelo
Thanks for the reply. Are there plans to allow this runtime interactions with a dstream context? From the surface they seem doable. What is preventing this to work?

Also... I implemented the modifiable windowdstream and it seemed to work good. Thanks for the pointer. 

Gino B.

On Jun 2, 2014, at 7:14 PM, Tathagata Das <[hidden email]> wrote:

Currently Spark Streaming does not support addition/deletion/modification of DStream after the streaming context has been started. 
Nor can you restart a stopped streaming context. 
Also, multiple spark contexts (and therefore multiple streaming contexts) cannot be run concurrently in the same JVM. 

To change the window duration, I would one of the following.

1. Stop the previous streaming context, create a new streaming context, and setup the dstreams once again with the new window duration
2. Create a custom DStream, say DynamicWindowDStream. Take a look at how WindowedDStream is implemented (pretty simple, just a union over RDDs across time). That should allow you to modify the window duration. However, do make sure you have a maximum window duration that you will ever reach, and make sure you define parentRememberDuration as a "rememberDuration + maxWindowDuration". That fields defines which RDDs can be forgotten, so is sensitive to the window duration. Then you have to take care of correctly (atomically, etc.) modifying the window duration as per your requirements.

Happy streaming!

TD




On Mon, Jun 2, 2014 at 2:46 PM, lbustelo <[hidden email]> wrote:
This is a general question about whether Spark Streaming can be interactive
like batch Spark jobs. I've read plenty of threads and done my fair bit of
experimentation and I'm thinking the answer is NO, but it does not hurt to
ask.

More specifically, I would like to be able to do:
1. Add/Remove steps to the Streaming Job
2. Modify Window durations
3. Stop and Restart context.

I've tried the following:

1. Modify the DStream after it has been started… BOOM! Exceptions
everywhere.

2. Stop the DStream, Make modification, Start… NOT GOOD :( In 0.9.0 I was
getting deadlocks. I also tried 1.0.0 and it did not work.

3. Based on information provided  here
<http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-and-the-spark-shell-tp3347p3371.html>
, I was been able to prototype modifying the RDD computation within a
forEachRDD. That is nice, but you are then bounded to the specified batch
size. That got me to wanting to modify Window durations. Is changing the
Window duration possible?

4. Tried running multiple streaming context from within a single Driver
application and got several exceptions. The first one was bind exception on
the web port. Then once the app started getting run (cores were taken but
1st job) it did not run correctly. A lot of
"akka.pattern.AskTimeoutException: Timed out"
.

I've tried my experiments in 0.9.0, 0.9.1 and 1.0.0 running on Standalone
Cluster setup.
Thanks in advanced



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Interactive-modification-of-DStreams-tp6740.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Interactive modification of DStreams

Tobias Pfeiffer
Gino,

On Wed, Jun 4, 2014 at 9:51 AM, Gino Bustelo <[hidden email]> wrote:
> Thanks for the reply. Are there plans to allow this runtime interactions
> with a dstream context?

I would be interested in that as well.

> Also... I implemented the modifiable windowdstream and it seemed to work
> good. Thanks for the pointer.

That is, you can change the size of an existing window after the
SparkStreamingContext is started?
Do you mind sharing the code?

Thanks
Tobias