streaming with disparate streams or non frequent streams
This post has NOT been accepted by the mailing list yet.
How does Apache-Spark handle streaming of non frequent data.
I am not an expert in spark, so please don't mind if this is very basic question.
For example : consider a Kafka topic which will be injected data once in day. How are these streams handled or what is the best way to handle these streams compared to Apache-Spark Kafka Direct-streams, which is suited for continuous flow of data.
I don't want to create a batch of day to handle this as the batch window is too huge and I won't know at what hour of the day data is sent to the topic, as I use a batch window of one day all the messages I receive on that will have timestamp of day not the exact time when it was created (Please correct me if I am wrong).