Spark Streaming with Kafka | Check if DStream is Empty | HDFS Write

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Spark Streaming with Kafka | Check if DStream is Empty | HDFS Write

anishsneh@yahoo.co.in
Hi All

I am using Spark Streaming with Kafka, I recieve messages and after minor processing I write them to HDFS, as of now I am using saveAsTextFiles() / saveAsHadoopFiles() Java methods

- Is there some default way of writing stream to Hadoop like we have HDFS sink concept in Flume? I mean is there some configurable way of writing at Spark Streaming after processing DStream.
- How can I check if DStream is empty so that I can skip HDFS write if no message is present (I am pulling Kafka topic every 1 sec)? because sometime it writes empty file to HDFS due to unavailability of messages.
 
Please suggest.

TIA
--
Anish Sneh
"Experience is the best teacher."
+91-99718-55883
http://in.linkedin.com/in/anishsneh