Spark LOCAL mode and external jar (extraClassPath)

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark LOCAL mode and external jar (extraClassPath)

jb44
I'm running spark in LOCAL mode and trying to get it to talk to alluxio. I'm
getting the error: java.lang.ClassNotFoundException: Class
alluxio.hadoop.FileSystem not found
The cause of this error is apparently that Spark cannot find the alluxio
client jar in its classpath.

I have looked at the page here:
https://www.alluxio.org/docs/master/en/Debugging-Guide.html#q-why-do-i-see-exceptions-like-javalangruntimeexception-javalangclassnotfoundexception-class-alluxiohadoopfilesystem-not-found

Which details the steps to take in this situation, but I'm not finding
success.

According to Spark documentation, I can instance a local Spark like so:

SparkSession.builder
  .appName("App")
  .getOrCreate

Then I can add the alluxio client library like so:
sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)
sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)

I have verified that the proper jar file exists in the right location on my
local machine with:
logger.error(sparkSession.conf.get("spark.driver.extraClassPath"))
logger.error(sparkSession.conf.get("spark.executor.extraClassPath"))

But I still get the error. Is there anything else I can do to figure out why
Spark is not picking the library up?

Please note I am not using spark-submit - I am aware of the methods for
adding the client jar to a spark-submit job. My Spark instance is being
created as local within my application and this is the use case I want to
solve.

As an FYI there is another application in the cluster which is connecting to
my alluxio using the fs client and that all works fine. In that case,
though, the fs client is being packaged as part of the application through
standard sbt dependencies.





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark LOCAL mode and external jar (extraClassPath)

Haoyuan Li

Best regards,

Haoyuan (HY)



On Thu, Apr 12, 2018 at 6:32 PM, jb44 <[hidden email]> wrote:
I'm running spark in LOCAL mode and trying to get it to talk to alluxio. I'm
getting the error: java.lang.ClassNotFoundException: Class
alluxio.hadoop.FileSystem not found
The cause of this error is apparently that Spark cannot find the alluxio
client jar in its classpath.

I have looked at the page here:
https://www.alluxio.org/docs/master/en/Debugging-Guide.html#q-why-do-i-see-exceptions-like-javalangruntimeexception-javalangclassnotfoundexception-class-alluxiohadoopfilesystem-not-found

Which details the steps to take in this situation, but I'm not finding
success.

According to Spark documentation, I can instance a local Spark like so:

SparkSession.builder
  .appName("App")
  .getOrCreate

Then I can add the alluxio client library like so:
sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)
sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)

I have verified that the proper jar file exists in the right location on my
local machine with:
logger.error(sparkSession.conf.get("spark.driver.extraClassPath"))
logger.error(sparkSession.conf.get("spark.executor.extraClassPath"))

But I still get the error. Is there anything else I can do to figure out why
Spark is not picking the library up?

Please note I am not using spark-submit - I am aware of the methods for
adding the client jar to a spark-submit job. My Spark instance is being
created as local within my application and this is the use case I want to
solve.

As an FYI there is another application in the cluster which is connecting to
my alluxio using the fs client and that all works fine. In that case,
though, the fs client is being packaged as part of the application through
standard sbt dependencies.





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Spark LOCAL mode and external jar (extraClassPath)

jb44
Haoyuan -

As I mentioned below, I've been through the documentation already.  It has
not helped me to resolve the issue.

Here is what I have tried so far:

- setting extraClassPath as explained below
- adding fs.alluxio.impl through sparkconf
- adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
this matters in my case)
- compiling the client from source

Do you have any other suggestions on how to get this working?  

Thanks



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark LOCAL mode and external jar (extraClassPath)

Geoff Von Allmen

I fought with a ClassNotFoundException for quite some time, but it was for kafka.

The final configuration that got everything working was running spark-submit with the following options:

--jars "/path/to/.ivy2/jars/package.jar" \
--driver-class-path "/path/to/.ivy2/jars/package.jar" \
--conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
--packages org.some.package:package_name:version

While this was needed for me to run in cluster mode, it works equally well for client mode as well.

One other note when needing to supplied multiple items to these args - --jars and --packages should be comma separated, --driver-class-path and extraClassPath should be : separated

HTH


On Fri, Apr 13, 2018 at 4:28 AM, jb44 <[hidden email]> wrote:
Haoyuan -

As I mentioned below, I've been through the documentation already.  It has
not helped me to resolve the issue.

Here is what I have tried so far:

- setting extraClassPath as explained below
- adding fs.alluxio.impl through sparkconf
- adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
this matters in my case)
- compiling the client from source

Do you have any other suggestions on how to get this working? 

Thanks



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Spark LOCAL mode and external jar (extraClassPath)

jb44
Hi Geoff -

Appreciate the help here - I do understand what you’re saying below.  And I am able to get this working when I submit a job to a local cluster.

I think part of the issue here is that there’s ambiguity in the terminology.  When I say “LOCAL” spark, I mean an instance of spark that is created by my driver program, and is not a cluster itself.  It means that my master node is “local”, and this mode is primarily used for testing.


While I am able to get alluxio working with spark-submit, I am unable to get it working when using local mode.  The mechanisms for setting class paths during spark-submit are not available in local mode.  My understanding is that all one is able to use is:

spark.conf.set(“”)

To set any runtime properties of the local instance.  Note that it is possible (and I am more convinced of this as time goes on) that alluxio simply does not work in spark local mode as described above.


On Apr 13, 2018, at 11:09 AM, Geoff Von Allmen <[hidden email]> wrote:

I fought with a ClassNotFoundException for quite some time, but it was for kafka.

The final configuration that got everything working was running spark-submit with the following options:

--jars "/path/to/.ivy2/jars/package.jar" \
--driver-class-path "/path/to/.ivy2/jars/package.jar" \
--conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
--packages org.some.package:package_name:version

While this was needed for me to run in cluster mode, it works equally well for client mode as well.

One other note when needing to supplied multiple items to these args - --jars and --packages should be comma separated, --driver-class-path and extraClassPath should be : separated

HTH


On Fri, Apr 13, 2018 at 4:28 AM, jb44 <[hidden email]> wrote:
Haoyuan -

As I mentioned below, I've been through the documentation already.  It has
not helped me to resolve the issue.

Here is what I have tried so far:

- setting extraClassPath as explained below
- adding fs.alluxio.impl through sparkconf
- adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
this matters in my case)
- compiling the client from source

Do you have any other suggestions on how to get this working? 

Thanks



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



Reply | Threaded
Open this post in threaded view
|

Re: Spark LOCAL mode and external jar (extraClassPath)

Geoff Von Allmen
Ok - `LOCAL` makes sense now.

Do you have the option to still use `spark-submit` in this scenario, but using the following options:

```bash
--master "local[*]" \
--deploy-mode "client" \
...
```

I know in the past, I have setup some options using `.config("Option", "value")` when creating the spark session, and then other runtime options as you describe above with `spark.conf.set`. At this point though I've just moved everything out into a `spark-submit` script.

On Fri, Apr 13, 2018 at 8:18 AM, Jason Boorn <[hidden email]> wrote:
Hi Geoff -

Appreciate the help here - I do understand what you’re saying below.  And I am able to get this working when I submit a job to a local cluster.

I think part of the issue here is that there’s ambiguity in the terminology.  When I say “LOCAL” spark, I mean an instance of spark that is created by my driver program, and is not a cluster itself.  It means that my master node is “local”, and this mode is primarily used for testing.


While I am able to get alluxio working with spark-submit, I am unable to get it working when using local mode.  The mechanisms for setting class paths during spark-submit are not available in local mode.  My understanding is that all one is able to use is:

spark.conf.set(“”)

To set any runtime properties of the local instance.  Note that it is possible (and I am more convinced of this as time goes on) that alluxio simply does not work in spark local mode as described above.


On Apr 13, 2018, at 11:09 AM, Geoff Von Allmen <[hidden email]> wrote:

I fought with a ClassNotFoundException for quite some time, but it was for kafka.

The final configuration that got everything working was running spark-submit with the following options:

--jars "/path/to/.ivy2/jars/package.jar" \
--driver-class-path "/path/to/.ivy2/jars/package.jar" \
--conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
--packages org.some.package:package_name:version

While this was needed for me to run in cluster mode, it works equally well for client mode as well.

One other note when needing to supplied multiple items to these args - --jars and --packages should be comma separated, --driver-class-path and extraClassPath should be : separated

HTH


On Fri, Apr 13, 2018 at 4:28 AM, jb44 <[hidden email]> wrote:
Haoyuan -

As I mentioned below, I've been through the documentation already.  It has
not helped me to resolve the issue.

Here is what I have tried so far:

- setting extraClassPath as explained below
- adding fs.alluxio.impl through sparkconf
- adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
this matters in my case)
- compiling the client from source

Do you have any other suggestions on how to get this working? 

Thanks



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




Reply | Threaded
Open this post in threaded view
|

Re: Spark LOCAL mode and external jar (extraClassPath)

jb44
I do, and this is what I will fall back to if nobody has a better idea :)

I was just hoping to get this working as it is much more convenient for my testing pipeline.

Thanks again for the help

On Apr 13, 2018, at 11:33 AM, Geoff Von Allmen <[hidden email]> wrote:

Ok - `LOCAL` makes sense now.

Do you have the option to still use `spark-submit` in this scenario, but using the following options:

```bash
--master "local[*]" \
--deploy-mode "client" \
...
```

I know in the past, I have setup some options using `.config("Option", "value")` when creating the spark session, and then other runtime options as you describe above with `spark.conf.set`. At this point though I've just moved everything out into a `spark-submit` script.

On Fri, Apr 13, 2018 at 8:18 AM, Jason Boorn <[hidden email]> wrote:
Hi Geoff -

Appreciate the help here - I do understand what you’re saying below.  And I am able to get this working when I submit a job to a local cluster.

I think part of the issue here is that there’s ambiguity in the terminology.  When I say “LOCAL” spark, I mean an instance of spark that is created by my driver program, and is not a cluster itself.  It means that my master node is “local”, and this mode is primarily used for testing.


While I am able to get alluxio working with spark-submit, I am unable to get it working when using local mode.  The mechanisms for setting class paths during spark-submit are not available in local mode.  My understanding is that all one is able to use is:

spark.conf.set(“”)

To set any runtime properties of the local instance.  Note that it is possible (and I am more convinced of this as time goes on) that alluxio simply does not work in spark local mode as described above.


On Apr 13, 2018, at 11:09 AM, Geoff Von Allmen <[hidden email]> wrote:

I fought with a ClassNotFoundException for quite some time, but it was for kafka.

The final configuration that got everything working was running spark-submit with the following options:

--jars "/path/to/.ivy2/jars/package.jar" \
--driver-class-path "/path/to/.ivy2/jars/package.jar" \
--conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
--packages org.some.package:package_name:version

While this was needed for me to run in cluster mode, it works equally well for client mode as well.

One other note when needing to supplied multiple items to these args - --jars and --packages should be comma separated, --driver-class-path and extraClassPath should be : separated

HTH


On Fri, Apr 13, 2018 at 4:28 AM, jb44 <[hidden email]> wrote:
Haoyuan -

As I mentioned below, I've been through the documentation already.  It has
not helped me to resolve the issue.

Here is what I have tried so far:

- setting extraClassPath as explained below
- adding fs.alluxio.impl through sparkconf
- adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
this matters in my case)
- compiling the client from source

Do you have any other suggestions on how to get this working? 

Thanks



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]





Reply | Threaded
Open this post in threaded view
|

Re: Spark LOCAL mode and external jar (extraClassPath)

yohann jardin

Hey Jason,

Might be related to what is behind your variable ALLUXIO_SPARK_CLIENT and where is located the lib (is it on HDFS, on the node that submits the job, or locally to all spark workers?)
There is a great post on SO about it: https://stackoverflow.com/a/37348234

We might as well check that you provide correctly the jar based on its location. I have found it tricky in some cases.
As a debug try, if the jar is not on HDFS, you can copy it there and then specify the full path in the extraclasspath property.

Regards,

Yohann Jardin

Le 4/13/2018 à 5:38 PM, Jason Boorn a écrit :
I do, and this is what I will fall back to if nobody has a better idea :)

I was just hoping to get this working as it is much more convenient for my testing pipeline.

Thanks again for the help

On Apr 13, 2018, at 11:33 AM, Geoff Von Allmen <[hidden email]> wrote:

Ok - `LOCAL` makes sense now.

Do you have the option to still use `spark-submit` in this scenario, but using the following options:

```bash
--master "local[*]" \
--deploy-mode "client" \
...
```

I know in the past, I have setup some options using `.config("Option", "value")` when creating the spark session, and then other runtime options as you describe above with `spark.conf.set`. At this point though I've just moved everything out into a `spark-submit` script.

On Fri, Apr 13, 2018 at 8:18 AM, Jason Boorn <[hidden email]> wrote:
Hi Geoff -

Appreciate the help here - I do understand what you’re saying below.  And I am able to get this working when I submit a job to a local cluster.

I think part of the issue here is that there’s ambiguity in the terminology.  When I say “LOCAL” spark, I mean an instance of spark that is created by my driver program, and is not a cluster itself.  It means that my master node is “local”, and this mode is primarily used for testing.


While I am able to get alluxio working with spark-submit, I am unable to get it working when using local mode.  The mechanisms for setting class paths during spark-submit are not available in local mode.  My understanding is that all one is able to use is:

spark.conf.set(“”)

To set any runtime properties of the local instance.  Note that it is possible (and I am more convinced of this as time goes on) that alluxio simply does not work in spark local mode as described above.


On Apr 13, 2018, at 11:09 AM, Geoff Von Allmen <[hidden email]> wrote:

I fought with a ClassNotFoundException for quite some time, but it was for kafka.

The final configuration that got everything working was running spark-submit with the following options:

--jars "/path/to/.ivy2/jars/package.jar" \
--driver-class-path "/path/to/.ivy2/jars/package.jar" \
--conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
--packages org.some.package:package_name:version

While this was needed for me to run in cluster mode, it works equally well for client mode as well.

One other note when needing to supplied multiple items to these args - --jars and --packages should be comma separated, --driver-class-path and extraClassPath should be : separated

HTH


On Fri, Apr 13, 2018 at 4:28 AM, jb44 <[hidden email]> wrote:
Haoyuan -

As I mentioned below, I've been through the documentation already.  It has
not helped me to resolve the issue.

Here is what I have tried so far:

- setting extraClassPath as explained below
- adding fs.alluxio.impl through sparkconf
- adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
this matters in my case)
- compiling the client from source

Do you have any other suggestions on how to get this working? 

Thanks



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]






Reply | Threaded
Open this post in threaded view
|

Re: Spark LOCAL mode and external jar (extraClassPath)

jb44
Thanks - I’ve seen this SO post, it covers spark-submit, which I am not using.

Regarding the ALLUXIO_SPARK_CLIENT variable, it is located on the machine that is running the job which spawns the master=local spark.  According to the Spark documentation, this should be possible, but it appears it is not.

Once again - I’m trying to solve the use case for master=local, NOT for a cluster and NOT with spark-submit.  

On Apr 13, 2018, at 12:47 PM, yohann jardin <[hidden email]> wrote:

Hey Jason,

Might be related to what is behind your variable ALLUXIO_SPARK_CLIENT and where is located the lib (is it on HDFS, on the node that submits the job, or locally to all spark workers?)
There is a great post on SO about it: https://stackoverflow.com/a/37348234

We might as well check that you provide correctly the jar based on its location. I have found it tricky in some cases.
As a debug try, if the jar is not on HDFS, you can copy it there and then specify the full path in the extraclasspath property.

Regards,

Yohann Jardin

Le 4/13/2018 à 5:38 PM, Jason Boorn a écrit :
I do, and this is what I will fall back to if nobody has a better idea :)

I was just hoping to get this working as it is much more convenient for my testing pipeline.

Thanks again for the help

On Apr 13, 2018, at 11:33 AM, Geoff Von Allmen <[hidden email]> wrote:

Ok - `LOCAL` makes sense now.

Do you have the option to still use `spark-submit` in this scenario, but using the following options:

```bash
--master "local[*]" \
--deploy-mode "client" \
...
```

I know in the past, I have setup some options using `.config("Option", "value")` when creating the spark session, and then other runtime options as you describe above with `spark.conf.set`. At this point though I've just moved everything out into a `spark-submit` script.

On Fri, Apr 13, 2018 at 8:18 AM, Jason Boorn <[hidden email]> wrote:
Hi Geoff -

Appreciate the help here - I do understand what you’re saying below.  And I am able to get this working when I submit a job to a local cluster.

I think part of the issue here is that there’s ambiguity in the terminology.  When I say “LOCAL” spark, I mean an instance of spark that is created by my driver program, and is not a cluster itself.  It means that my master node is “local”, and this mode is primarily used for testing.


While I am able to get alluxio working with spark-submit, I am unable to get it working when using local mode.  The mechanisms for setting class paths during spark-submit are not available in local mode.  My understanding is that all one is able to use is:

spark.conf.set(“”)

To set any runtime properties of the local instance.  Note that it is possible (and I am more convinced of this as time goes on) that alluxio simply does not work in spark local mode as described above.


On Apr 13, 2018, at 11:09 AM, Geoff Von Allmen <[hidden email]> wrote:

I fought with a ClassNotFoundException for quite some time, but it was for kafka.

The final configuration that got everything working was running spark-submit with the following options:

--jars "/path/to/.ivy2/jars/package.jar" \
--driver-class-path "/path/to/.ivy2/jars/package.jar" \
--conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
--packages org.some.package:package_name:version

While this was needed for me to run in cluster mode, it works equally well for client mode as well.

One other note when needing to supplied multiple items to these args - --jars and --packages should be comma separated, --driver-class-path and extraClassPath should be : separated

HTH


On Fri, Apr 13, 2018 at 4:28 AM, jb44 <[hidden email]> wrote:
Haoyuan -

As I mentioned below, I've been through the documentation already.  It has
not helped me to resolve the issue.

Here is what I have tried so far:

- setting extraClassPath as explained below
- adding fs.alluxio.impl through sparkconf
- adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
this matters in my case)
- compiling the client from source

Do you have any other suggestions on how to get this working? 

Thanks



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]







Reply | Threaded
Open this post in threaded view
|

Re: Spark LOCAL mode and external jar (extraClassPath)

Marcelo Vanzin
In reply to this post by jb44
There are two things you're doing wrong here:

On Thu, Apr 12, 2018 at 6:32 PM, jb44 <[hidden email]> wrote:
> Then I can add the alluxio client library like so:
> sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)

First one, you can't modify JVM configuration after it has already
started. So this line does nothing since it can't re-launch your
application with a new JVM.

> sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)

There is a lot of configuration that you cannot set after the
application has already started. For example, after the session is
created, most probably this option will be ignored, since executors
will already have started.

I'm not so sure about what happens when you use dynamic allocation,
but these post-hoc config changes in general are not expected to take
effect.

The documentation could be clearer about this (especially stuff that
only applies to spark-submit), but that's the gist of it.


--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark LOCAL mode and external jar (extraClassPath)

jb44
Ok thanks - I was basing my design on this:


Wherein it says:
Once the SparkSession is instantiated, you can configure Spark’s runtime config properties. 
Apparently the suite of runtime configs you can change does not include classpath.  

So the answer to my original question is basically this:

When using local (pseudo-cluster) mode, there is no way to add external jars to the spark instance.  This means that Alluxio will not work with Spark when Spark is run in master=local mode.

Thanks again - often getting a definitive “no” is almost as good as a yes.  Almost ;)

On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin <[hidden email]> wrote:

There are two things you're doing wrong here:

On Thu, Apr 12, 2018 at 6:32 PM, jb44 <[hidden email]> wrote:
Then I can add the alluxio client library like so:
sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)

First one, you can't modify JVM configuration after it has already
started. So this line does nothing since it can't re-launch your
application with a new JVM.

sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)

There is a lot of configuration that you cannot set after the
application has already started. For example, after the session is
created, most probably this option will be ignored, since executors
will already have started.

I'm not so sure about what happens when you use dynamic allocation,
but these post-hoc config changes in general are not expected to take
effect.

The documentation could be clearer about this (especially stuff that
only applies to spark-submit), but that's the gist of it.


--
Marcelo

Reply | Threaded
Open this post in threaded view
|

Re: Spark LOCAL mode and external jar (extraClassPath)

gene.pang
Hi Jason,

Alluxio does work with Spark in master=local mode. This is because both spark-submit and spark-shell have command-line options to set the classpath for the JVM that is being started.

If you are not using spark-submit or spark-shell, you will have to figure out how to configure that JVM instance with the proper properties.

Thanks,
Gene

On Fri, Apr 13, 2018 at 10:47 AM, Jason Boorn <[hidden email]> wrote:
Ok thanks - I was basing my design on this:


Wherein it says:
Once the SparkSession is instantiated, you can configure Spark’s runtime config properties. 
Apparently the suite of runtime configs you can change does not include classpath.  

So the answer to my original question is basically this:

When using local (pseudo-cluster) mode, there is no way to add external jars to the spark instance.  This means that Alluxio will not work with Spark when Spark is run in master=local mode.

Thanks again - often getting a definitive “no” is almost as good as a yes.  Almost ;)

On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin <[hidden email]> wrote:

There are two things you're doing wrong here:

On Thu, Apr 12, 2018 at 6:32 PM, jb44 <[hidden email]> wrote:
Then I can add the alluxio client library like so:
sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)

First one, you can't modify JVM configuration after it has already
started. So this line does nothing since it can't re-launch your
application with a new JVM.

sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)

There is a lot of configuration that you cannot set after the
application has already started. For example, after the session is
created, most probably this option will be ignored, since executors
will already have started.

I'm not so sure about what happens when you use dynamic allocation,
but these post-hoc config changes in general are not expected to take
effect.

The documentation could be clearer about this (especially stuff that
only applies to spark-submit), but that's the gist of it.


--
Marcelo


Reply | Threaded
Open this post in threaded view
|

Re: Spark LOCAL mode and external jar (extraClassPath)

jb44
Hi Gene - 

Are you saying that I just need to figure out how to get the Alluxio jar into the classpath of my parent application?  If it shows up in the classpath then Spark will automatically know that it needs to use it when communicating with Alluxio?

Apologies for going back-and-forth on this - I feel like my particular use case is clouding what is already a tricky issue.

On Apr 13, 2018, at 2:26 PM, Gene Pang <[hidden email]> wrote:

Hi Jason,

Alluxio does work with Spark in master=local mode. This is because both spark-submit and spark-shell have command-line options to set the classpath for the JVM that is being started.

If you are not using spark-submit or spark-shell, you will have to figure out how to configure that JVM instance with the proper properties.

Thanks,
Gene

On Fri, Apr 13, 2018 at 10:47 AM, Jason Boorn <[hidden email]> wrote:
Ok thanks - I was basing my design on this:


Wherein it says:
Once the SparkSession is instantiated, you can configure Spark’s runtime config properties. 
Apparently the suite of runtime configs you can change does not include classpath.  

So the answer to my original question is basically this:

When using local (pseudo-cluster) mode, there is no way to add external jars to the spark instance.  This means that Alluxio will not work with Spark when Spark is run in master=local mode.

Thanks again - often getting a definitive “no” is almost as good as a yes.  Almost ;)

On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin <[hidden email]> wrote:

There are two things you're doing wrong here:

On Thu, Apr 12, 2018 at 6:32 PM, jb44 <[hidden email]> wrote:
Then I can add the alluxio client library like so:
sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)

First one, you can't modify JVM configuration after it has already
started. So this line does nothing since it can't re-launch your
application with a new JVM.

sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)

There is a lot of configuration that you cannot set after the
application has already started. For example, after the session is
created, most probably this option will be ignored, since executors
will already have started.

I'm not so sure about what happens when you use dynamic allocation,
but these post-hoc config changes in general are not expected to take
effect.

The documentation could be clearer about this (especially stuff that
only applies to spark-submit), but that's the gist of it.


--
Marcelo



Reply | Threaded
Open this post in threaded view
|

Re: Spark LOCAL mode and external jar (extraClassPath)

gene.pang
Yes, I think that is the case. I haven't tried that before, but it should work.

Thanks,
Gene

On Fri, Apr 13, 2018 at 11:32 AM, Jason Boorn <[hidden email]> wrote:
Hi Gene - 

Are you saying that I just need to figure out how to get the Alluxio jar into the classpath of my parent application?  If it shows up in the classpath then Spark will automatically know that it needs to use it when communicating with Alluxio?

Apologies for going back-and-forth on this - I feel like my particular use case is clouding what is already a tricky issue.

On Apr 13, 2018, at 2:26 PM, Gene Pang <[hidden email]> wrote:

Hi Jason,

Alluxio does work with Spark in master=local mode. This is because both spark-submit and spark-shell have command-line options to set the classpath for the JVM that is being started.

If you are not using spark-submit or spark-shell, you will have to figure out how to configure that JVM instance with the proper properties.

Thanks,
Gene

On Fri, Apr 13, 2018 at 10:47 AM, Jason Boorn <[hidden email]> wrote:
Ok thanks - I was basing my design on this:


Wherein it says:
Once the SparkSession is instantiated, you can configure Spark’s runtime config properties. 
Apparently the suite of runtime configs you can change does not include classpath.  

So the answer to my original question is basically this:

When using local (pseudo-cluster) mode, there is no way to add external jars to the spark instance.  This means that Alluxio will not work with Spark when Spark is run in master=local mode.

Thanks again - often getting a definitive “no” is almost as good as a yes.  Almost ;)

On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin <[hidden email]> wrote:

There are two things you're doing wrong here:

On Thu, Apr 12, 2018 at 6:32 PM, jb44 <[hidden email]> wrote:
Then I can add the alluxio client library like so:
sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)

First one, you can't modify JVM configuration after it has already
started. So this line does nothing since it can't re-launch your
application with a new JVM.

sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)

There is a lot of configuration that you cannot set after the
application has already started. For example, after the session is
created, most probably this option will be ignored, since executors
will already have started.

I'm not so sure about what happens when you use dynamic allocation,
but these post-hoc config changes in general are not expected to take
effect.

The documentation could be clearer about this (especially stuff that
only applies to spark-submit), but that's the gist of it.


--
Marcelo




Reply | Threaded
Open this post in threaded view
|

Re: Spark LOCAL mode and external jar (extraClassPath)

jb44
Ok great I’ll give that a shot -

Thanks for all the help

On Apr 14, 2018, at 12:08 PM, Gene Pang <[hidden email]> wrote:

Yes, I think that is the case. I haven't tried that before, but it should work.

Thanks,
Gene

On Fri, Apr 13, 2018 at 11:32 AM, Jason Boorn <[hidden email]> wrote:
Hi Gene - 

Are you saying that I just need to figure out how to get the Alluxio jar into the classpath of my parent application?  If it shows up in the classpath then Spark will automatically know that it needs to use it when communicating with Alluxio?

Apologies for going back-and-forth on this - I feel like my particular use case is clouding what is already a tricky issue.

On Apr 13, 2018, at 2:26 PM, Gene Pang <[hidden email]> wrote:

Hi Jason,

Alluxio does work with Spark in master=local mode. This is because both spark-submit and spark-shell have command-line options to set the classpath for the JVM that is being started.

If you are not using spark-submit or spark-shell, you will have to figure out how to configure that JVM instance with the proper properties.

Thanks,
Gene

On Fri, Apr 13, 2018 at 10:47 AM, Jason Boorn <[hidden email]> wrote:
Ok thanks - I was basing my design on this:


Wherein it says:
Once the SparkSession is instantiated, you can configure Spark’s runtime config properties. 
Apparently the suite of runtime configs you can change does not include classpath.  

So the answer to my original question is basically this:

When using local (pseudo-cluster) mode, there is no way to add external jars to the spark instance.  This means that Alluxio will not work with Spark when Spark is run in master=local mode.

Thanks again - often getting a definitive “no” is almost as good as a yes.  Almost ;)

On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin <[hidden email]> wrote:

There are two things you're doing wrong here:

On Thu, Apr 12, 2018 at 6:32 PM, jb44 <[hidden email]> wrote:
Then I can add the alluxio client library like so:
sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)

First one, you can't modify JVM configuration after it has already
started. So this line does nothing since it can't re-launch your
application with a new JVM.

sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)

There is a lot of configuration that you cannot set after the
application has already started. For example, after the session is
created, most probably this option will be ignored, since executors
will already have started.

I'm not so sure about what happens when you use dynamic allocation,
but these post-hoc config changes in general are not expected to take
effect.

The documentation could be clearer about this (especially stuff that
only applies to spark-submit), but that's the gist of it.


--
Marcelo





Reply | Threaded
Open this post in threaded view
|

Re: Spark LOCAL mode and external jar (extraClassPath)

Gourav Sengupta
Hi,

if you start spark or pyspark from command line and then add the option --jars and see that things are working fine, then it means that you will have to add the jar either to SPARK_HOME jars file or modify the spark-env file to include the path pointing to the location where the jar file is stored. This location has to be accessible by all the worker nodes.


Regards,
Gourav Sengupta

On Sat, Apr 14, 2018 at 6:02 PM, Jason Boorn <[hidden email]> wrote:
Ok great I’ll give that a shot -

Thanks for all the help


On Apr 14, 2018, at 12:08 PM, Gene Pang <[hidden email]> wrote:

Yes, I think that is the case. I haven't tried that before, but it should work.

Thanks,
Gene

On Fri, Apr 13, 2018 at 11:32 AM, Jason Boorn <[hidden email]> wrote:
Hi Gene - 

Are you saying that I just need to figure out how to get the Alluxio jar into the classpath of my parent application?  If it shows up in the classpath then Spark will automatically know that it needs to use it when communicating with Alluxio?

Apologies for going back-and-forth on this - I feel like my particular use case is clouding what is already a tricky issue.

On Apr 13, 2018, at 2:26 PM, Gene Pang <[hidden email]> wrote:

Hi Jason,

Alluxio does work with Spark in master=local mode. This is because both spark-submit and spark-shell have command-line options to set the classpath for the JVM that is being started.

If you are not using spark-submit or spark-shell, you will have to figure out how to configure that JVM instance with the proper properties.

Thanks,
Gene

On Fri, Apr 13, 2018 at 10:47 AM, Jason Boorn <[hidden email]> wrote:
Ok thanks - I was basing my design on this:


Wherein it says:
Once the SparkSession is instantiated, you can configure Spark’s runtime config properties. 
Apparently the suite of runtime configs you can change does not include classpath.  

So the answer to my original question is basically this:

When using local (pseudo-cluster) mode, there is no way to add external jars to the spark instance.  This means that Alluxio will not work with Spark when Spark is run in master=local mode.

Thanks again - often getting a definitive “no” is almost as good as a yes.  Almost ;)

On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin <[hidden email]> wrote:

There are two things you're doing wrong here:

On Thu, Apr 12, 2018 at 6:32 PM, jb44 <[hidden email]> wrote:
Then I can add the alluxio client library like so:
sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)

First one, you can't modify JVM configuration after it has already
started. So this line does nothing since it can't re-launch your
application with a new JVM.

sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)

There is a lot of configuration that you cannot set after the
application has already started. For example, after the session is
created, most probably this option will be ignored, since executors
will already have started.

I'm not so sure about what happens when you use dynamic allocation,
but these post-hoc config changes in general are not expected to take
effect.

The documentation could be clearer about this (especially stuff that
only applies to spark-submit), but that's the gist of it.


--
Marcelo