I am currently building a spark structured streaming application where I am
doing a batch-stream join. And the source for the batch data gets updated
periodically.
So, I am planning to do a persist/unpersist of that batch data periodically.
Below is a sample code which I am using to persist and unpersist the batch
data.
Flow: -> Read the batch data -> persist the batch data -> For every one
hour, unpersist the data and read the batch data and persist it again.
But, I am not seeing the batch data getting refreshed for every hour.
Code:
var batchDF = handler.readBatchDF(sparkSession)
batchDF.persist(StorageLevel.MEMORY_AND_DISK)
var refreshedTime: Instant = Instant.now()
if (Duration.between(refreshedTime, Instant.now()).getSeconds > refreshTime)
{
refreshedTime = Instant.now()
batchDF.unpersist(false)
batchDF = handler.readBatchDF(sparkSession)
.persist(StorageLevel.MEMORY_AND_DISK)
}
Is there any better way to achieve this scenario in spark structured
streaming jobs ?
--
Sent from:
http://apache-spark-user-list.1001560.n3.nabble.com/---------------------------------------------------------------------
To unsubscribe e-mail:
[hidden email]