Spark Streaming with Kafka | Check if DStream is Empty | HDFS Write
I am using Spark Streaming with Kafka, I recieve messages and after minor processing I write them to HDFS, as of now I am using saveAsTextFiles() / saveAsHadoopFiles() Java methods
- Is there some default way of writing stream to Hadoop like we have HDFS sink concept in Flume? I mean is there some configurable way of writing at Spark Streaming after processing DStream.
- How can I check if DStream is empty so that I can skip HDFS write if no message is present (I am pulling Kafka topic every 1 sec)? because
sometime it writes empty file to HDFS due to unavailability of messages.