Hi ,
Yes I understand its skew based problem but how can it be avoided . Could you please suggest?
I am in Spark2.4
Thanks
Rajat
On Tue, Jan 26, 2021 at 3:58 PM German Schiavon <
[hidden email]> wrote:
Hi,
One word : SKEW
It seems the classic skew problem, you would have to apply skew techniques to repartition your data properly or if you are in spark 3.0+ try the skewJoin optimization.
Hi Everyone,
I am running a spark application where I have applied 2 left joins. 1st join in Broadcast and another one is normal.
Out of 200 tasks , last 1 task is stuck . It is running at "ANY" Locality level. It seems data skewness issue.
It is doing too much spill and shuffle write is too much. Following error is coming in executor logs:
INFO UnsafeExternalSorter: Thread spilling sort data of 10.4 GB to disk (10 times so far)
Can anyone please suggest what can be wrong?
Thanks
Rajat