is RosckDB backend available in 3.0 preview?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

is RosckDB backend available in 3.0 preview?

kant kodali
Hi All,

1. is RosckDB backend available in 3.0 preview?
2. if RocksDB can store intermediate results of a stream-stream join can I run  streaming join queries forever? forever I mean until I run out of disk. or put it another can I run the stream-stream join queries for years if necessary (imagine I got lot of disk capacity but not a whole lot of RAM)?
3. Does is to do incremental checkpointing to HDFS?

Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: is RosckDB backend available in 3.0 preview?

Jungtaek Lim-2
Unfortunately, the short answer is no. Please refer the last part of discussion on the PR https://github.com/apache/spark/pull/24922

Unless we get any native implementation of this, I guess this project is most widely known implementation for RocksDB backend state store - https://github.com/chermenin/spark-states

On Wed, Apr 22, 2020 at 11:32 AM kant kodali <[hidden email]> wrote:
Hi All,

1. is RosckDB backend available in 3.0 preview?
2. if RocksDB can store intermediate results of a stream-stream join can I run  streaming join queries forever? forever I mean until I run out of disk. or put it another can I run the stream-stream join queries for years if necessary (imagine I got lot of disk capacity but not a whole lot of RAM)?
3. Does is to do incremental checkpointing to HDFS?

Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: is RosckDB backend available in 3.0 preview?

kant kodali
is it going to make it in 3.0?

On Tue, Apr 21, 2020 at 9:24 PM Jungtaek Lim <[hidden email]> wrote:
Unfortunately, the short answer is no. Please refer the last part of discussion on the PR https://github.com/apache/spark/pull/24922

Unless we get any native implementation of this, I guess this project is most widely known implementation for RocksDB backend state store - https://github.com/chermenin/spark-states

On Wed, Apr 22, 2020 at 11:32 AM kant kodali <[hidden email]> wrote:
Hi All,

1. is RosckDB backend available in 3.0 preview?
2. if RocksDB can store intermediate results of a stream-stream join can I run  streaming join queries forever? forever I mean until I run out of disk. or put it another can I run the stream-stream join queries for years if necessary (imagine I got lot of disk capacity but not a whole lot of RAM)?
3. Does is to do incremental checkpointing to HDFS?

Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: is RosckDB backend available in 3.0 preview?

Jungtaek Lim-2
Sorry I should have been more clear.

The discussion went to the conclusion that RocksDB state store cannot be included in Spark main codebase - it should start as individual project, and can be adopted when the project is popular enough. (See PR for more details.) That's why I guided to the implementation on Spark ecosystem.

On Thu, Apr 23, 2020 at 1:22 AM kant kodali <[hidden email]> wrote:
is it going to make it in 3.0?

On Tue, Apr 21, 2020 at 9:24 PM Jungtaek Lim <[hidden email]> wrote:
Unfortunately, the short answer is no. Please refer the last part of discussion on the PR https://github.com/apache/spark/pull/24922

Unless we get any native implementation of this, I guess this project is most widely known implementation for RocksDB backend state store - https://github.com/chermenin/spark-states

On Wed, Apr 22, 2020 at 11:32 AM kant kodali <[hidden email]> wrote:
Hi All,

1. is RosckDB backend available in 3.0 preview?
2. if RocksDB can store intermediate results of a stream-stream join can I run  streaming join queries forever? forever I mean until I run out of disk. or put it another can I run the stream-stream join queries for years if necessary (imagine I got lot of disk capacity but not a whole lot of RAM)?
3. Does is to do incremental checkpointing to HDFS?

Thanks!