I'm running a 10 node standalone cluster and I'm having issues with a stage
completing - it keeps hanging somewhere between 196 and 199/200 blocks
completed, but never errors and doesn't move forward.
<
http://apache-spark-user-list.1001560.n3.nabble.com/file/t9731/stages.png>
If I look at the task(s) still running, the stdout and stderr always give
the same message:
Error: invalid log directory
/usr/local/spark/spark-2.4.0-bin-hadoop2.7/work/app-20181129113214-0002/0/
<
http://apache-spark-user-list.1001560.n3.nabble.com/file/t9731/error.png>
This always happens on the same node. If I SSH into that node app folder, I
see that there is a /1/, but not a /0/.
Why is it looking for the wrong folder? This is stage 16/19, so it isn't
like it bombs from the get-go - that executor has done many previous tasks.
I can't figure out how to troubleshoot any further - the spark job never
bombs, that one task just keeps running...
<
http://apache-spark-user-list.1001560.n3.nabble.com/file/t9731/workers.png>
Thanks!
--
Sent from:
http://apache-spark-user-list.1001560.n3.nabble.com/---------------------------------------------------------------------
To unsubscribe e-mail:
[hidden email]