how to set SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES otpimally

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

how to set SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES otpimally

Chen Jin
Hi all,

From spark document, we can set the number of workers by
SPARK_WORKER_INSTANCES and the max number of cores that worker can
take by using SPARK_WORKER_CORES, if I have 5 8-core machine, which
one would perform better between
(a)
   SPARK_WORKER_INSTANCES = 8
   SPARK_WORKER_CORES = 1

and
(b)
   SPARK_WORKER_INSTANCES = 1
   SPARK_WORKER_CORES = 8

(a) gives us 40 workers with each core per worker (b) gives 8 workers
while each worker has eight cores. Any advice on which better would
lead to better performance?

Thanks a lot,

-chen
Reply | Threaded
Open this post in threaded view
|

Re: how to set SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES otpimally

Archit Thakur
Chen, The first one will launch 8 single threaded JVM's and the 2nd one will launch 1 8-threaded JVM.
Performance depends on your data: If your data size is too small to be processed, 2nd one is better because of the launching time of 8 JVM's in first case. Also, if you have broadcasted anything, it'll have to that for 8 machines.
However, if you have quite big data to be processed, 1st one is better because i. In this case you can ignore the launching time of JVM. and ii. You'll now have 8 times memory available for processing.
Assumption made: All machines are equipped with same memory/computing power.

"""(a) gives us 40 workers with each core per worker (b) gives 8 workers
while each worker has eight cores. Any advice on which better would
lead to better performance?"""

No, (a) gives u 8 workers with each core per worker (b) gives 1 worker
while each worker has eight cores.

Let me know, if any doubts.

Thanks and Regards,
Archit Thakur.



On Sun, Jan 26, 2014 at 5:58 AM, Chen Jin <[hidden email]> wrote:
Hi all,

From spark document, we can set the number of workers by
SPARK_WORKER_INSTANCES and the max number of cores that worker can
take by using SPARK_WORKER_CORES, if I have 5 8-core machine, which
one would perform better between
(a)
   SPARK_WORKER_INSTANCES = 8
   SPARK_WORKER_CORES = 1

and
(b)
   SPARK_WORKER_INSTANCES = 1
   SPARK_WORKER_CORES = 8

(a) gives us 40 workers with each core per worker (b) gives 8 workers
while each worker has eight cores. Any advice on which better would
lead to better performance?

Thanks a lot,

-chen

Reply | Threaded
Open this post in threaded view
|

Re: how to set SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES otpimally

Chen Jin
Hi Ankit,

Thanks for detailed explanation. Since my cluster has 5 machines each
of which has 8 cores and 48g memory, I was meant to say for the entire
cluster:

(a) gives us 40 workers with each core per worker (b) gives 5 workers
while each worker has eight cores.

A follow-up question, since each machine has 48g memory,

(a)
   SPARK_WORKER_INSTANCES = 8
   SPARK_WORKER_CORES = 1
   SPARK_WORKER_MEMORY = 6g

(b)
   SPARK_WORKER_INSTANCES = 1
   SPARK_WORKER_CORES = 8
   SPARK_WORKER_MEMORY = 48g

Will (a) setting help consume large dataset, while as you said each
machine has 8 JVMs now?

Thanks a lot,

-chen

On Sun, Jan 26, 2014 at 1:53 AM, Archit Thakur
<[hidden email]> wrote:

> Chen, The first one will launch 8 single threaded JVM's and the 2nd one will
> launch 1 8-threaded JVM.
> Performance depends on your data: If your data size is too small to be
> processed, 2nd one is better because of the launching time of 8 JVM's in
> first case. Also, if you have broadcasted anything, it'll have to that for 8
> machines.
> However, if you have quite big data to be processed, 1st one is better
> because i. In this case you can ignore the launching time of JVM. and ii.
> You'll now have 8 times memory available for processing.
> Assumption made: All machines are equipped with same memory/computing power.
>
>
> """(a) gives us 40 workers with each core per worker (b) gives 8 workers
> while each worker has eight cores. Any advice on which better would
> lead to better performance?"""
>
> No, (a) gives u 8 workers with each core per worker (b) gives 1 worker
>
> while each worker has eight cores.
>
> Let me know, if any doubts.
>
> Thanks and Regards,
> Archit Thakur.
>
>
>
> On Sun, Jan 26, 2014 at 5:58 AM, Chen Jin <[hidden email]> wrote:
>>
>> Hi all,
>>
>> From spark document, we can set the number of workers by
>> SPARK_WORKER_INSTANCES and the max number of cores that worker can
>> take by using SPARK_WORKER_CORES, if I have 5 8-core machine, which
>> one would perform better between
>> (a)
>>    SPARK_WORKER_INSTANCES = 8
>>    SPARK_WORKER_CORES = 1
>>
>> and
>> (b)
>>    SPARK_WORKER_INSTANCES = 1
>>    SPARK_WORKER_CORES = 8
>>
>> (a) gives us 40 workers with each core per worker (b) gives 8 workers
>> while each worker has eight cores. Any advice on which better would
>> lead to better performance?
>>
>> Thanks a lot,
>>
>> -chen
>
>