Spark 2.4.0 worker can't find work/app/folderNo directory for logs

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Spark 2.4.0 worker can't find work/app/folderNo directory for logs

flyingmeatball
I'm running a 10 node standalone cluster and I'm having issues with a stage
completing - it keeps hanging somewhere between 196 and 199/200 blocks
completed, but never errors and doesn't move forward.

<http://apache-spark-user-list.1001560.n3.nabble.com/file/t9731/stages.png>

If I look at the task(s) still running, the stdout and stderr always give
the same message:
Error: invalid log directory
/usr/local/spark/spark-2.4.0-bin-hadoop2.7/work/app-20181129113214-0002/0/

<http://apache-spark-user-list.1001560.n3.nabble.com/file/t9731/error.png>

This always happens on the same node. If I SSH into that node app folder, I
see that there is a /1/, but not a /0/.

Why is it looking for the wrong folder? This is stage 16/19, so it isn't
like it bombs from the get-go - that executor has done many previous tasks.
I can't figure out how to troubleshoot any further - the spark job never
bombs, that one task just keeps running...

<http://apache-spark-user-list.1001560.n3.nabble.com/file/t9731/workers.png>

Thanks!



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]