This code read a kafka topic and then writes it to hdfs every 60 seconds.
My questions are:
1. when running readStream() what happens? does the spark job do something
or is it just like a "transformation" in spark's terminology and nothing
actually happens until an action is called?
2. writeStream() is started and the wrtiing happend every 60 seconds. Can I
intervene somehow in what happens when the actual writing occurs? for
example, can I write a log message "60 seconds passed, writing bulk to hdfs"
each time ?
3. is it possible to write to the same hdfs file each time the actual
writing occurs? for now it creates a new hdfs file each time.