[Spark on Kubernetes] Spark Application dependency management Question.
Currently, we were upgrading the spark version from 2.4 to 3.0. But we
found that the applications, which work in spark 2.4, keep failing with
Spark 3.0. We are running Spark on Kubernetes with cluster mode.
In spark-submit, we have "--jars local:///apps-dep/spark-extra-jars/*". It
is fine when we are using spark 2.4.5 image, but when we try to submit the
same application using spark 3.0 image. The driver always fails. First, it
complains "WARN DependencyUtils: Local jar /apps-dep/spark-extra-jars/* does
not exist, skipping.". Then the driver fails with the exception "Exception
in thread "main" org.apache.spark.SparkException: Job aborted due to stage
failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task
0.3 in stage 0.0 (TID 6, 10.98.179.228, executor 1):
java.nio.file.NoSuchFileException: /apps-dep/spark-extra-jars/*". But I can
make sure that all of our dependency jars does exist in
/apps-dep/spark-extra-jars in the docker image for both driver and workers.
And using spark 2.4.5, it works fine.
Could you give me a hint on how to debug it and what is going on here?
Also, I do not understand the following behaviors:
* if I changed the --jar parameter value from "local:///" to "file:///".
Using spark 3.0, it works.
* if I use "--jars local:///apps-dep/spark-extra-jars/app.jar", the
submission would fail with the exception "Exception in thread "main"
org.apache.spark.SparkException: Please specify
spark.kubernetes.file.upload.path property." which makes sense according to
the spark 3.0 doc. But if I use "--jar ///apps-dep/spark-extra-jars/*", the
submission and the application will run successfully. Could you help me to
understand why it is fine to use "*" instead of the specific jar file?