streaming with disparate streams or non frequent streams

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

streaming with disparate streams or non frequent streams

This post has NOT been accepted by the mailing list yet.
How does Apache-Spark handle streaming of non frequent data.

I am not an expert in spark, so please don't mind if this is very basic question.

For example : consider a Kafka topic which will be injected data once in day. How are these streams handled or what is the best way to handle these streams compared to Apache-Spark Kafka Direct-streams, which is suited for continuous flow of data.

I don't want to create a batch of day to handle this as the batch window is too huge and I won't know at what hour of the day data is sent to the topic, as I use a batch window of one day all the messages I receive on that will have timestamp of day not the exact time when it was created (Please correct me if I am wrong).