Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

John Zhuge
Hi,

I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is spark-env.sh sourced when starting the Spark AM container or the executor container?


Note: When running Spark on YARN in cluster mode, environment variables need to be set using the spark.yarn.appMasterEnv.[EnvironmentVariableName] property in your conf/spark-defaults.conf file. Environment variables that are set in spark-env.sh will not be reflected in the YARN Application Master process in clustermode. See the YARN-related Spark Properties for more information.

Does it mean spark-env.sh will not be sourced when starting AM in cluster mode?
Does this paragraph appy to executor as well?

Thanks,
--
John Zhuge
Reply | Threaded
Open this post in threaded view
|

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

Jacek Laskowski
Hi,

My understanding is that AM with the driver (in cluster deploy mode) and executors are simple Java processes with their settings set one by one while submitting a Spark application for execution and creating ContainerLaunchContext for launching YARN containers. See https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala?utf8=%E2%9C%93#L796-L801 for the code that does the settings to properties mapping.

With that I think conf/spark-defaults.conf won't be loaded by itself.

Why don't you set a property and see if it's available on the driver in cluster deploy mode? That should give you a definitive answer (or at least get you closer).


On Wed, Jan 3, 2018 at 7:57 AM, John Zhuge <[hidden email]> wrote:
Hi,

I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is spark-env.sh sourced when starting the Spark AM container or the executor container?


Note: When running Spark on YARN in cluster mode, environment variables need to be set using the spark.yarn.appMasterEnv.[EnvironmentVariableName] property in your conf/spark-defaults.conf file. Environment variables that are set in spark-env.sh will not be reflected in the YARN Application Master process in clustermode. See the YARN-related Spark Properties for more information.

Does it mean spark-env.sh will not be sourced when starting AM in cluster mode?
Does this paragraph appy to executor as well?

Thanks,
--
John Zhuge

Reply | Threaded
Open this post in threaded view
|

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

Marcelo Vanzin
In reply to this post by John Zhuge
On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge <[hidden email]> wrote:
> I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is
> spark-env.sh sourced when starting the Spark AM container or the executor
> container?

No, it's not.

--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

John Zhuge-2
Thanks Jacek and Marcelo!

Any reason it is not sourced? Any security consideration?


On Wed, Jan 3, 2018 at 9:59 AM, Marcelo Vanzin <[hidden email]> wrote:
On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge <[hidden email]> wrote:
> I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is
> spark-env.sh sourced when starting the Spark AM container or the executor
> container?

No, it's not.

--
Marcelo



--
John
Reply | Threaded
Open this post in threaded view
|

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

Marcelo Vanzin
Because spark-env.sh is something that makes sense only on the gateway
machine (where the app is being submitted from).

On Wed, Jan 3, 2018 at 6:46 PM, John Zhuge <[hidden email]> wrote:

> Thanks Jacek and Marcelo!
>
> Any reason it is not sourced? Any security consideration?
>
>
> On Wed, Jan 3, 2018 at 9:59 AM, Marcelo Vanzin <[hidden email]> wrote:
>>
>> On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge <[hidden email]> wrote:
>> > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is
>> > spark-env.sh sourced when starting the Spark AM container or the
>> > executor
>> > container?
>>
>> No, it's not.
>>
>> --
>> Marcelo
>
>
>
>
> --
> John



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

John Zhuge-2
Sounds good.

Should we add another paragraph after this paragraph in configuration.md to explain executor env as well? I will be happy to upload a simple patch.

Note: When running Spark on YARN in cluster mode, environment variables need to be set using the spark.yarn.appMasterEnv.[EnvironmentVariableName] property in your conf/spark-defaults.conf file. Environment variables that are set in spark-env.sh will not be reflected in the YARN Application Master process in clustermode. See the YARN-related Spark Properties for more information.

Something like:

Note: When running Spark on YARN, environment variables for the executors need to be set using the spark.yarn.executorEnv.[EnvironmentVariableName] property in your conf/spark-defaults.conf file or on the command line. Environment variables that are set in spark-env.sh will not be reflected in the executor process.
 


On Wed, Jan 3, 2018 at 7:53 PM, Marcelo Vanzin <[hidden email]> wrote:
Because spark-env.sh is something that makes sense only on the gateway
machine (where the app is being submitted from).

On Wed, Jan 3, 2018 at 6:46 PM, John Zhuge <[hidden email]> wrote:
> Thanks Jacek and Marcelo!
>
> Any reason it is not sourced? Any security consideration?
>
>
> On Wed, Jan 3, 2018 at 9:59 AM, Marcelo Vanzin <[hidden email]> wrote:
>>
>> On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge <[hidden email]> wrote:
>> > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is
>> > spark-env.sh sourced when starting the Spark AM container or the
>> > executor
>> > container?
>>
>> No, it's not.
>>
>> --
>> Marcelo
>
>
>
>
> --
> John



--
Marcelo



--
John
Reply | Threaded
Open this post in threaded view
|

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

Marcelo Vanzin
On Wed, Jan 3, 2018 at 8:18 PM, John Zhuge <[hidden email]> wrote:
> Something like:
>
> Note: When running Spark on YARN, environment variables for the executors
> need to be set using the spark.yarn.executorEnv.[EnvironmentVariableName]
> property in your conf/spark-defaults.conf file or on the command line.
> Environment variables that are set in spark-env.sh will not be reflected in
> the executor process.

I'm not against adding docs, but that's probably true for all
backends. No backend I know sources spark-env.sh before starting
executors.

For example, the standalone worker sources spark-env.sh before
starting the daemon, and those env variables "leak" to the executors.
But you can't customize an individual executor's environment that way
without restarting the service.

--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]