[Spark Streaming]: Does DStream workload run over Spark SQL engine?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Spark Streaming]: Does DStream workload run over Spark SQL engine?

Khaled Zaouk
Hi,

I have a question regarding the execution engine of Spark Streaming
(DStream API): Does Spark streaming jobs run over the Spark SQL engine?

For example, if I change a configuration parameter related to Spark SQL
(like spark.sql.streaming.minBatchesToRetain or 
spark.sql.objectHashAggregate.sortBased.fallbackThreshold), does this 
make any difference when I run Spark streaming job (using DStream API)?

Thank you!

Khaled
Reply | Threaded
Open this post in threaded view
|

Re: [Spark Streaming]: Does DStream workload run over Spark SQL engine?

Saisai Shao
No, the underlying of DStream is RDD, so it will not leverage any SparkSQL related feature. I think you should use Structured Streaming instead, which is based on SparkSQL.

Khaled Zaouk <[hidden email]> 于2018年5月2日周三 下午4:51写道:
Hi,

I have a question regarding the execution engine of Spark Streaming
(DStream API): Does Spark streaming jobs run over the Spark SQL engine?

For example, if I change a configuration parameter related to Spark SQL
(like spark.sql.streaming.minBatchesToRetain or 
spark.sql.objectHashAggregate.sortBased.fallbackThreshold), does this 
make any difference when I run Spark streaming job (using DStream API)?

Thank you!

Khaled