I am running Airflow + Spark + AKS (Azure K8s). Sporadically, when I have a spark job complete, my spark-submit process does not notice that the driver has succeeded and continues to track the
job as running. Does anyone know how spark-submit process monitors driver processes on k8s? My expectation is that it monitors them by HTTP, but since we actually deleted the driver pod and the spark-submit process continued to show the process as in progress,
I am now questioning this assumption. My end goal is to have spark-submit track driver behavior more accurately.
NOTE: This communication and any attachments are for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient,
please contact the sender by replying to this email, and destroy all copies of the original message.
I am not using Airflow but I assume your application is deployed in cluster
mode and in this case the class you are looking for is
If we are talking about the first "spark-submit" used to start the
application and not "spark-submit --status" then it contains loop where the
application status is logged. This loop stops when the
"LoggingPodStatusWatcher" reports the app is completed  or when
"spark.kubernetes.submission.waitAppCompletion"  is false.
And you are right the monitoring (POD state watching) is done via REST
(HTTPS) and should be detected by
"io.fabric8.kubernetes.client.Watcher.onClose()" method so by the kubernetes
I hope this helps. Some further questions if you need some more help:
1. What is the Spark version you are running?
2. Does it contain SPARK-24266 ?
3. If yes can you reproduce the issue without airflow and do you have the
logs about the issue?