Restarting a failed Spark streaming job running on top of a yarn cluster
We have few spark job streaming jobs running on a yarn cluster, and from
time to time a job need to be restarted (it was killed due to external
reason or others).
Once we submit the new job we are face with the following exception:
ERROR spark.SparkContext: Failed to add
to Spark environment
Of course we know that *application_1537885048149_15382* correspond to the
previous job that was killed, and that our Yarn is cleaning up the usercache
directory very often to avoid choking the filesystem with so many unused
However what can you guys recommend for long running jobs that have to be
restarted but the previous context is not available due to the cleanup?
Hope is clear what i meant, if you need more information just ask.