Passing runtime config to workers?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Passing runtime config to workers?

srobertjames
What is a good way to pass config variables to workers?

I've tried setting them in environment variables via spark-env.sh, but, as
far as I can tell, the environment variables set there don't appear in
workers' environments.  If I want to be able to configure all workers,
what's a good way to do it?  For example, I want to tell all workers:
USE_ALGO_A or USE_ALGO_B - but I don't want to recompile.
Reply | Threaded
Open this post in threaded view
|

Re: Passing runtime config to workers?

DB Tsai-2
Since the evn variables in driver will not be passed into workers, the most easy way you can do is refer to the variables directly in workers from driver. 

For example,

val variableYouWantToUse = System.getenv("something defined in env")

rdd.map(
you can access `variableYouWantToUse` here
)



Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Fri, May 16, 2014 at 1:59 PM, Robert James <[hidden email]> wrote:
What is a good way to pass config variables to workers?

I've tried setting them in environment variables via spark-env.sh, but, as
far as I can tell, the environment variables set there don't appear in
workers' environments.  If I want to be able to configure all workers,
what's a good way to do it?  For example, I want to tell all workers:
USE_ALGO_A or USE_ALGO_B - but I don't want to recompile.

Reply | Threaded
Open this post in threaded view
|

Re: Passing runtime config to workers?

srobertjames
I see - I didn't realize that scope would work like that.  Are you
saying that any variable that is in scope of the lambda passed to map
will be automagically propagated to all workers? What if it's not
explicitly referenced in the map, only used by it.  E.g.:

def main:
  settings.setSettings
  rdd.map(x => F.f(x))

object F {
  def f(...)...
  val settings:...
}

F.f accesses F.settings, like a Singleton.  The master sets F.settings
before using F.f in a map.  Will all workers have the same F.settings
as seen by F.f?



On 5/16/14, DB Tsai <[hidden email]> wrote:

> Since the evn variables in driver will not be passed into workers, the most
> easy way you can do is refer to the variables directly in workers from
> driver.
>
> For example,
>
> val variableYouWantToUse = System.getenv("something defined in env")
>
> rdd.map(
> you can access `variableYouWantToUse` here
> )
>
>
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Fri, May 16, 2014 at 1:59 PM, Robert James
> <[hidden email]>wrote:
>
>> What is a good way to pass config variables to workers?
>>
>> I've tried setting them in environment variables via spark-env.sh, but,
>> as
>> far as I can tell, the environment variables set there don't appear in
>> workers' environments.  If I want to be able to configure all workers,
>> what's a good way to do it?  For example, I want to tell all workers:
>> USE_ALGO_A or USE_ALGO_B - but I don't want to recompile.
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Passing runtime config to workers?

DB Tsai-2
When you reference any variable outside the executor's scope, spark will automatically serialize them in the driver, and send them to executors, which implies, those variables have to implement serializable. 

For the example you mention, the Spark will serialize object F, and if it's not serializable, it will raise exception.


Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Sun, May 18, 2014 at 12:58 PM, Robert James <[hidden email]> wrote:
I see - I didn't realize that scope would work like that.  Are you
saying that any variable that is in scope of the lambda passed to map
will be automagically propagated to all workers? What if it's not
explicitly referenced in the map, only used by it.  E.g.:

def main:
  settings.setSettings
  rdd.map(x => F.f(x))

object F {
  def f(...)...
  val settings:...
}

F.f accesses F.settings, like a Singleton.  The master sets F.settings
before using F.f in a map.  Will all workers have the same F.settings
as seen by F.f?



On 5/16/14, DB Tsai <[hidden email]> wrote:
> Since the evn variables in driver will not be passed into workers, the most
> easy way you can do is refer to the variables directly in workers from
> driver.
>
> For example,
>
> val variableYouWantToUse = System.getenv("something defined in env")
>
> rdd.map(
> you can access `variableYouWantToUse` here
> )
>
>
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Fri, May 16, 2014 at 1:59 PM, Robert James
> <[hidden email]>wrote:
>
>> What is a good way to pass config variables to workers?
>>
>> I've tried setting them in environment variables via spark-env.sh, but,
>> as
>> far as I can tell, the environment variables set there don't appear in
>> workers' environments.  If I want to be able to configure all workers,
>> what's a good way to do it?  For example, I want to tell all workers:
>> USE_ALGO_A or USE_ALGO_B - but I don't want to recompile.
>>
>