default parallelism in trunk

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

default parallelism in trunk

Koert Kuipers
i just managed to upgrade my 0.9-SNAPSHOT from the last scala 2.9.x version to the latest.


everything seems good except that my default parallelism is now set to 2 for jobs instead of some smart number based on the number of cores (i think that is what it used to do). it this change on purpose?

i am running spark standalone.

thx, koert
Reply | Threaded
Open this post in threaded view
|

Re: default parallelism in trunk

Aaron Davidson
Could you give an example where default parallelism is set to 2 where it didn't used to be?

Here is the relevant section for the spark standalone mode: CoarseGrainedSchedulerBackend.scala#L211. If spark.default.parallelism is set, it will override anything else. If it is not set, we will use the total number of cores in the cluster and 2, which is the same logic that has been used since spark-0.7.

Simplest possibility is that you're setting spark.default.parallelism, otherwise there may be a bug introduced somewhere that isn't defaulting correctly anymore.


On Sat, Feb 1, 2014 at 12:30 AM, Koert Kuipers <[hidden email]> wrote:
i just managed to upgrade my 0.9-SNAPSHOT from the last scala 2.9.x version to the latest.


everything seems good except that my default parallelism is now set to 2 for jobs instead of some smart number based on the number of cores (i think that is what it used to do). it this change on purpose?

i am running spark standalone.

thx, koert

Reply | Threaded
Open this post in threaded view
|

Re: default parallelism in trunk

Aaron Davidson
Sorry, I meant to say we will use the maximum between (the total number of cores in the cluster) and (2) if spark.default.parallelism is not set. So this should not be causing your problem unless your cluster thinks it has less than 2 cores.


On Sun, Feb 2, 2014 at 4:46 PM, Aaron Davidson <[hidden email]> wrote:
Could you give an example where default parallelism is set to 2 where it didn't used to be?

Here is the relevant section for the spark standalone mode: CoarseGrainedSchedulerBackend.scala#L211. If spark.default.parallelism is set, it will override anything else. If it is not set, we will use the total number of cores in the cluster and 2, which is the same logic that has been used since spark-0.7.

Simplest possibility is that you're setting spark.default.parallelism, otherwise there may be a bug introduced somewhere that isn't defaulting correctly anymore.


On Sat, Feb 1, 2014 at 12:30 AM, Koert Kuipers <[hidden email]> wrote:
i just managed to upgrade my 0.9-SNAPSHOT from the last scala 2.9.x version to the latest.


everything seems good except that my default parallelism is now set to 2 for jobs instead of some smart number based on the number of cores (i think that is what it used to do). it this change on purpose?

i am running spark standalone.

thx, koert


Reply | Threaded
Open this post in threaded view
|

Re: default parallelism in trunk

Koert Kuipers
After the upgrade spark-shell still behaved properly. But a scala program that defined it's own SparkContext and did not set spark.default.parallelism suddenly was stuck with a parallelism of 2. I "fixed it" by setting a desired spark.default.parallelism system property for now, and no longer relying on the default.


On Sun, Feb 2, 2014 at 7:48 PM, Aaron Davidson <[hidden email]> wrote:
Sorry, I meant to say we will use the maximum between (the total number of cores in the cluster) and (2) if spark.default.parallelism is not set. So this should not be causing your problem unless your cluster thinks it has less than 2 cores.


On Sun, Feb 2, 2014 at 4:46 PM, Aaron Davidson <[hidden email]> wrote:
Could you give an example where default parallelism is set to 2 where it didn't used to be?

Here is the relevant section for the spark standalone mode: CoarseGrainedSchedulerBackend.scala#L211. If spark.default.parallelism is set, it will override anything else. If it is not set, we will use the total number of cores in the cluster and 2, which is the same logic that has been used since spark-0.7.

Simplest possibility is that you're setting spark.default.parallelism, otherwise there may be a bug introduced somewhere that isn't defaulting correctly anymore.


On Sat, Feb 1, 2014 at 12:30 AM, Koert Kuipers <[hidden email]> wrote:
i just managed to upgrade my 0.9-SNAPSHOT from the last scala 2.9.x version to the latest.


everything seems good except that my default parallelism is now set to 2 for jobs instead of some smart number based on the number of cores (i think that is what it used to do). it this change on purpose?

i am running spark standalone.

thx, koert