spark structured streaming with file based sources and sinks

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

spark structured streaming with file based sources and sinks

Koert Kuipers
has anyone used spark structured streaming from/to files (json, csv, parquet, avro) in a non-test setting?

i realize kafka is probably the way to go, but lets say i have a situation where kafka is not available for reasons out of my control, and i want to do micro-batching. could i use files to do so in a production setting? basically:

files on hdfs => spark structured streaming => files on hdfs => spark structured streaming => files on hdfs => etc.

i assumed this is not a good idea but interested to hear otherwise.