Spark Structured Streaming resource contention / memory issue

Hi all, 

We have a Spark Structured streaming stream which is using mapGroupWithState. After some time of processing in a stable manner suddenly each mini batch starts taking 40 seconds. Suspiciously it looks like exactly 40 seconds each time. Before this the batches were taking less than a second.

Looking at the details for a particular task most partitions are processed really quickly but a few take exactly 40 seconds:

The GC was looking ok as the data was being processed quickly but suddenly the full GCs etc stop (at the same time as the 40 second issue):

I have taken a thread dump from one of the executors as this issue is happening but I cannot see any resource they are blocked on:

Are we hitting a GC problem and why is it manifesting in this way? Is there another resource that is blocking and what is it?