I have a question regarding Spark structured streaming:
will non-timebased window operations like the lag function be supported at some point, or is this not on the table due to technical difficulties?
I.e. will something like this be possible in the future:
w = Window.partitionBy('uid').orderBy('timestamp')
df = df.withColumn('lag', lag(df['col_A']).over(w))
This would be useful in streaming applications, where we look for “patterns” based on the occurrence of multiple events (rows) in a particular order.
A different way to achieve the above functionality, while being more general, would be to use joins, but for this a streaming-ready, monotonically increasing and (!) concurrent uid would be needed.
Thanks a lot & best,