Spark structured streaming with periodical persist and unpersist
I am currently building a spark structured streaming application where I am
doing a batch-stream join. And the source for the batch data gets updated
So, I am planning to do a persist/unpersist of that batch data periodically.
Below is a sample code which I am using to persist and unpersist the batch
Flow: -> Read the batch data -> persist the batch data -> For every one
hour, unpersist the data and read the batch data and persist it again.
But, I am not seeing the batch data getting refreshed for every hour.
var batchDF = handler.readBatchDF(sparkSession)
var refreshedTime: Instant = Instant.now()
if (Duration.between(refreshedTime, Instant.now()).getSeconds > refreshTime)
refreshedTime = Instant.now()
batchDF = handler.readBatchDF(sparkSession)
Is there any better way to achieve this scenario in spark structured
streaming jobs ?