--driver-class-path does not move jars, so it is dependent on your Spark resource manager (master). It is interpreted literally so if your files do not exist in the location you provide relative where the driver is run, they will not be placed on the classpath.
Since the driver is responsible for moving jars specified in --jars, you cannot use a jar specified by --jars to be in driver-class-path, since the driver is already started and it's classpath is already set before any jars are moved.
Some distributions may change this behavior though, but this is the jist of it.
As I understand Spark expects the jar files to be available on all nodes or if applicable on HDFS directory
Putting Spark Jar files on HDFS
1)the : jar cv0f spark-libs.jar -C $SPARK_HOME/jars/ . ) a HDFS the jars the application hdfs dfs -mkdir /jars ) Upload HDFS: hdfs dfs -put spark-libs.jar /jars ) a cluster, increase the the Spark so that you reduce the amount times a NodeManager will a remote copy hdfs dfs -setrep -w hdfs:///jars/spark-libs.jar ( the amount replicas proportional the total NodeManagers) ) $SPARK_HOME/conf/spark-defaults.conf spark.yarn.archive hdfs:///rhes75: /jars/spark-libs.jar. Similar below spark.yarn.archive=hdfs://rhes75: /jars/spark-libs.jar
Every node of Spark needs to have the same $SPARK_HOME/conf/spark-defaults.conf file
Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
On Thu, 12 Nov 2020 at 16:35, Russell Spitzer <[hidden email]> wrote:
In reply to this post by Russell Spitzer
> Since the driver is responsible for moving jars specified in --jars, you cannot use a jar specified by --jars to be in driver-class-path, since the driver is already started and it's classpath is already set before any jars are moved.
Your point is interesting, however I see some discrepancy with what the Spark doc that says:
""When using spark-submit, the application jar along with any jars included with the --jars option will be automatically transferred to the cluster. URLs supplied after --jars must be separated by commas. That list is included on the driver and executor classpaths. ""
The most interesting part here (for the discussion) is "That list [from --jars] is included on the driver and executor classpaths.".
That seems a contradiction with your sentence (as you state that a jar specified by --jars can't be in the driver classpath)
... hum, I am still thinking about how to reunite both sides.
Le jeu. 12 nov. 2020 à 17:34, Russell Spitzer <[hidden email]> a écrit :
In reply to this post by Mich Talebzadeh
To be sure, are you really saying that, using the option "spark.yarn.archive", YOU have been able to OVERRIDE installed Spark JAR with the JAR given with the option "spark.yarn.archive" ?
No more than "spark.yarn.archive" ?
Le jeu. 12 nov. 2020 à 18:01, Mich Talebzadeh <[hidden email]> a écrit :
|Free forum by Nabble||Edit this page|