streaming with disparate streams or non frequent streams

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
Report Content as Inappropriate

streaming with disparate streams or non frequent streams

This post has NOT been accepted by the mailing list yet.
How does Apache-Spark handle streaming of non frequent data.

I am not an expert in spark, so please don't mind if this is very basic question.

For example : consider a Kafka topic which will be injected data once in day. How are these streams handled or what is the best way to handle these streams compared to Apache-Spark Kafka Direct-streams, which is suited for continuous flow of data.

I don't want to create a batch of day to handle this as the batch window is too huge and I won't know at what hour of the day data is sent to the topic, as I use a batch window of one day all the messages I receive on that will have timestamp of day not the exact time when it was created (Please correct me if I am wrong).