Cascading Spark Structured streams

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Cascading Spark Structured streams

Eric Dain
I need to write a Spark Structured Streaming pipeline that involves multiple aggregations, splitting data into multiple sub-pipes and union them. Also it need to have stateful aggregation with timeout.

Spark Structured Streaming support all of the required functionality but not as one stream. I did a proof of concept that divide the pipeline into 3 sub-streams cascaded using Kafka and it seems to work. But I was wondering if it would be a good idea to skip Kafka and use HDFS files as integration. Or maybe there is another way to cascade streams without needing extra service like Kafka.

Thanks,