Hi all,

Before I go the route of rolling my own UDAF:

I'm doing a calculation of last 5 mean so I have the following window defined:

Window.partitionBy(person).orderBy(timestamp).rowsBetween(-4, Window.currentRow)

Then I calculate the mean over that window.

Within each partition, I'd like the first 4 elements to return null / NaN because there aren't enough rows to be a true "last 5." This is the behavior when I do this in pandas using rolling mean. Instead, it appears to calculate the mean of whatever rows happen to be in the partition, even if there is only 1 row.

Is there a simple way already in Spark to do this? It seems like a normal thing so I wonder if I am missing something.

Thanks!

Sumona