Using spark.jars conf to override jars present in spark default classpath

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Using spark.jars conf to override jars present in spark default classpath

nupurshukla
Hello,

How can we use spark.jars to to specify conflicting jars (that is, jars that are already present in the spark's default classpath)? Jars specified in this conf gets "appended" to the classpath, and thus gets looked at after the default classpath. Is it not intended to be used to specify conflicting jars?
Meanwhile when spark.driver.extraClassPath conf is specified, this path is "prepended" to the classpath and thus takes precedence over the default classpath.

How can I use both to specify different jars and paths but achieve a precedence of spark.jars path > spark.driver.extraClassPath > spark default classpath (left to right precedence order)?

Experiment conducted:

I am using sample-project.jar which has one class in it SampleProject. This has a method which prints the version number of the jar. For this experiment I am using 3 versions of this sample-project.jar
Sample-project-1.0.0.jar is present in the spark default classpath in my test cluster
Sample-project-2.0.0.jar is present in folder /home/<user>/ClassPathConf on driver
Sample-project-3.0.0.jar is present in  folder /home/<user>/JarsConf on driver

(Empty cell in img below means that conf was not specified)

image.png


Thank you,
Nupur


Reply | Threaded
Open this post in threaded view
|

Re: Using spark.jars conf to override jars present in spark default classpath

Russell Spitzer
I believe the main issue here is that spark.jars is a bit "too late" to actually prepend things to the class path. For most use cases this value is not read until after the JVM has already started and the system classloader has already loaded. 

The jar argument gets added via the dynamic class loader so it necessarily has to come after wards :/ Driver extra classpath and it's friends, modify the actual launch command of the driver (or executors) so they can prepend whenever they want.

 In general you do not want to have conflicting jars at all if possible and I would recommend looking into shading if it's really important for your application to use a specific incompatible version of a library. Jar (and extraClasspath) are really just
for adding additional jars and I personally would try not to rely on classpath ordering to get the right libraries recognized. 

On Thu, Jul 16, 2020 at 1:55 PM Nupur Shukla <[hidden email]> wrote:
Hello,

How can we use spark.jars to to specify conflicting jars (that is, jars that are already present in the spark's default classpath)? Jars specified in this conf gets "appended" to the classpath, and thus gets looked at after the default classpath. Is it not intended to be used to specify conflicting jars?
Meanwhile when spark.driver.extraClassPath conf is specified, this path is "prepended" to the classpath and thus takes precedence over the default classpath.

How can I use both to specify different jars and paths but achieve a precedence of spark.jars path > spark.driver.extraClassPath > spark default classpath (left to right precedence order)?

Experiment conducted:

I am using sample-project.jar which has one class in it SampleProject. This has a method which prints the version number of the jar. For this experiment I am using 3 versions of this sample-project.jar
Sample-project-1.0.0.jar is present in the spark default classpath in my test cluster
Sample-project-2.0.0.jar is present in folder /home/<user>/ClassPathConf on driver
Sample-project-3.0.0.jar is present in  folder /home/<user>/JarsConf on driver

(Empty cell in img below means that conf was not specified)

image.png


Thank you,
Nupur


Reply | Threaded
Open this post in threaded view
|

Re: Using spark.jars conf to override jars present in spark default classpath

Jeff Evans
If you can't avoid it, you need to make use of the spark.driver.userClassPathFirst and/or spark.executor.userClassPathFirst properties.

On Thu, Jul 16, 2020 at 2:03 PM Russell Spitzer <[hidden email]> wrote:
I believe the main issue here is that spark.jars is a bit "too late" to actually prepend things to the class path. For most use cases this value is not read until after the JVM has already started and the system classloader has already loaded. 

The jar argument gets added via the dynamic class loader so it necessarily has to come after wards :/ Driver extra classpath and it's friends, modify the actual launch command of the driver (or executors) so they can prepend whenever they want.

 In general you do not want to have conflicting jars at all if possible and I would recommend looking into shading if it's really important for your application to use a specific incompatible version of a library. Jar (and extraClasspath) are really just
for adding additional jars and I personally would try not to rely on classpath ordering to get the right libraries recognized. 

On Thu, Jul 16, 2020 at 1:55 PM Nupur Shukla <[hidden email]> wrote:
Hello,

How can we use spark.jars to to specify conflicting jars (that is, jars that are already present in the spark's default classpath)? Jars specified in this conf gets "appended" to the classpath, and thus gets looked at after the default classpath. Is it not intended to be used to specify conflicting jars?
Meanwhile when spark.driver.extraClassPath conf is specified, this path is "prepended" to the classpath and thus takes precedence over the default classpath.

How can I use both to specify different jars and paths but achieve a precedence of spark.jars path > spark.driver.extraClassPath > spark default classpath (left to right precedence order)?

Experiment conducted:

I am using sample-project.jar which has one class in it SampleProject. This has a method which prints the version number of the jar. For this experiment I am using 3 versions of this sample-project.jar
Sample-project-1.0.0.jar is present in the spark default classpath in my test cluster
Sample-project-2.0.0.jar is present in folder /home/<user>/ClassPathConf on driver
Sample-project-3.0.0.jar is present in  folder /home/<user>/JarsConf on driver

(Empty cell in img below means that conf was not specified)

image.png


Thank you,
Nupur


Reply | Threaded
Open this post in threaded view
|

Re: Using spark.jars conf to override jars present in spark default classpath

nupurshukla
Thank you Russel and Jeff,

My bad, I wasn't clear before about the conflicting jars. By that, I meant my application needs to use an updated version of certain jars than what are present in the default classpath. What would be the best way to use confs spark.jar and spark.driver.extraClassPath both to do a classpath reordering so that the updated versions get picked first? Looks like the one way to use extraClassPath conf here.




On Thu, 16 Jul 2020 at 12:05, Jeff Evans <[hidden email]> wrote:
If you can't avoid it, you need to make use of the spark.driver.userClassPathFirst and/or spark.executor.userClassPathFirst properties.

On Thu, Jul 16, 2020 at 2:03 PM Russell Spitzer <[hidden email]> wrote:
I believe the main issue here is that spark.jars is a bit "too late" to actually prepend things to the class path. For most use cases this value is not read until after the JVM has already started and the system classloader has already loaded. 

The jar argument gets added via the dynamic class loader so it necessarily has to come after wards :/ Driver extra classpath and it's friends, modify the actual launch command of the driver (or executors) so they can prepend whenever they want.

 In general you do not want to have conflicting jars at all if possible and I would recommend looking into shading if it's really important for your application to use a specific incompatible version of a library. Jar (and extraClasspath) are really just
for adding additional jars and I personally would try not to rely on classpath ordering to get the right libraries recognized. 

On Thu, Jul 16, 2020 at 1:55 PM Nupur Shukla <[hidden email]> wrote:
Hello,

How can we use spark.jars to to specify conflicting jars (that is, jars that are already present in the spark's default classpath)? Jars specified in this conf gets "appended" to the classpath, and thus gets looked at after the default classpath. Is it not intended to be used to specify conflicting jars?
Meanwhile when spark.driver.extraClassPath conf is specified, this path is "prepended" to the classpath and thus takes precedence over the default classpath.

How can I use both to specify different jars and paths but achieve a precedence of spark.jars path > spark.driver.extraClassPath > spark default classpath (left to right precedence order)?

Experiment conducted:

I am using sample-project.jar which has one class in it SampleProject. This has a method which prints the version number of the jar. For this experiment I am using 3 versions of this sample-project.jar
Sample-project-1.0.0.jar is present in the spark default classpath in my test cluster
Sample-project-2.0.0.jar is present in folder /home/<user>/ClassPathConf on driver
Sample-project-3.0.0.jar is present in  folder /home/<user>/JarsConf on driver

(Empty cell in img below means that conf was not specified)

image.png


Thank you,
Nupur


Reply | Threaded
Open this post in threaded view
|

Re: Using spark.jars conf to override jars present in spark default classpath

Russell Spitzer
That's what I'm saying you don't want to do :) If you have two versions of a library with different apis the safest approach is shading and ordering probably can't be relied on. In my experience reflection will behave in ways you may not like as well as which classpath has priority when a class is loading.  Spark.Jars will never be able to reorder so you'll need to get those jars on the system class loader using the driver (and executor) extra classpath args (with userClasspathFirst). I will stress again that it would be my last choice for getting it working and I would try shading first if I really have a conflict. 

On Thu, Jul 16, 2020 at 2:17 PM Nupur Shukla <[hidden email]> wrote:
Thank you Russel and Jeff,

My bad, I wasn't clear before about the conflicting jars. By that, I meant my application needs to use an updated version of certain jars than what are present in the default classpath. What would be the best way to use confs spark.jar and spark.driver.extraClassPath both to do a classpath reordering so that the updated versions get picked first? Looks like the one way to use extraClassPath conf here.




On Thu, 16 Jul 2020 at 12:05, Jeff Evans <[hidden email]> wrote:
If you can't avoid it, you need to make use of the spark.driver.userClassPathFirst and/or spark.executor.userClassPathFirst properties.

On Thu, Jul 16, 2020 at 2:03 PM Russell Spitzer <[hidden email]> wrote:
I believe the main issue here is that spark.jars is a bit "too late" to actually prepend things to the class path. For most use cases this value is not read until after the JVM has already started and the system classloader has already loaded. 

The jar argument gets added via the dynamic class loader so it necessarily has to come after wards :/ Driver extra classpath and it's friends, modify the actual launch command of the driver (or executors) so they can prepend whenever they want.

 In general you do not want to have conflicting jars at all if possible and I would recommend looking into shading if it's really important for your application to use a specific incompatible version of a library. Jar (and extraClasspath) are really just
for adding additional jars and I personally would try not to rely on classpath ordering to get the right libraries recognized. 

On Thu, Jul 16, 2020 at 1:55 PM Nupur Shukla <[hidden email]> wrote:
Hello,

How can we use spark.jars to to specify conflicting jars (that is, jars that are already present in the spark's default classpath)? Jars specified in this conf gets "appended" to the classpath, and thus gets looked at after the default classpath. Is it not intended to be used to specify conflicting jars?
Meanwhile when spark.driver.extraClassPath conf is specified, this path is "prepended" to the classpath and thus takes precedence over the default classpath.

How can I use both to specify different jars and paths but achieve a precedence of spark.jars path > spark.driver.extraClassPath > spark default classpath (left to right precedence order)?

Experiment conducted:

I am using sample-project.jar which has one class in it SampleProject. This has a method which prints the version number of the jar. For this experiment I am using 3 versions of this sample-project.jar
Sample-project-1.0.0.jar is present in the spark default classpath in my test cluster
Sample-project-2.0.0.jar is present in folder /home/<user>/ClassPathConf on driver
Sample-project-3.0.0.jar is present in  folder /home/<user>/JarsConf on driver

(Empty cell in img below means that conf was not specified)

image.png


Thank you,
Nupur