val rawRecords = spark.readStream.schema(myschema).parquet("/mytest")
val q = rawRecords.withColumn("g",$"id" % 100).groupBy("g").agg(avg($"id"))
val res = q.writeStream.outputMode("complete").trigger(ProcessingTime("10 seconds")).format("parquet").option("path", "/test2").option("checkpointLocation", "/mycheckpoint").start
The problem is that it tells me that parquet does not support the complete mode (or update for that matter).
So how would I do a streaming with aggregation to file?
In general, my goal is to have a single (slow) streaming process which would write some profile and then have a second streaming process which would load the current dataframe to be used in join (I would stop the second streaming process
and reload the dataframe periodically).