Parallelism: behavioural difference in version 1.2 and 2.1!?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Parallelism: behavioural difference in version 1.2 and 2.1!?

jeevan.ks
Hi,

I've two systems. One is built on Spark 1.2 and the other on 2.1. I am
benchmarking both with the same benchmarks (wordcount, grep, sort, etc.)
with the same data set from S3 bucket (size ranges from 50MB to 10 GB). The
Spark cluster I made use of is r3.xlarge, 8 instances, 4 cores each, and
28GB RAM. I observed a strange behaviour while running the benchmarks and is
as follows:

- When I ran Spark 1.2 version with default partition number
(sc.defaultParallelism), the jobs would take forever to complete. So I
changed it to the number of cores, i.e., 32 times 3 = 96. This did a magic
and the jobs completed quickly.

- However, when I tried the above magic number on the version 2.1, the jobs
are taking forever. Deafult parallelism works better, but not that
efficient.

I'm having problem to rationalise this and compare both the systems. My
question is: what changes were made from 1.2 to 2.1 with respect to default
parallelism for this behaviour to occur? How can I have both versions behave
similary on the same software/hardware configuration so that I can compare?

I'd really appreciate your help on this!

Cheers,
Jeevan



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Parallelism: behavioural difference in version 1.2 and 2.1!?

Apostolos N. Papadopoulos
Dear Jeevan,

Spark 1.2 is quite old, and If I were you I would go for a newer version.

However, is there a parallelism level (e.g., 20, 30) that works for both
installations?

regards,

Apostolos



On 29/08/2018 04:55 μμ, jeevan.ks wrote:

> Hi,
>
> I've two systems. One is built on Spark 1.2 and the other on 2.1. I am
> benchmarking both with the same benchmarks (wordcount, grep, sort, etc.)
> with the same data set from S3 bucket (size ranges from 50MB to 10 GB). The
> Spark cluster I made use of is r3.xlarge, 8 instances, 4 cores each, and
> 28GB RAM. I observed a strange behaviour while running the benchmarks and is
> as follows:
>
> - When I ran Spark 1.2 version with default partition number
> (sc.defaultParallelism), the jobs would take forever to complete. So I
> changed it to the number of cores, i.e., 32 times 3 = 96. This did a magic
> and the jobs completed quickly.
>
> - However, when I tried the above magic number on the version 2.1, the jobs
> are taking forever. Deafult parallelism works better, but not that
> efficient.
>
> I'm having problem to rationalise this and compare both the systems. My
> question is: what changes were made from 1.2 to 2.1 with respect to default
> parallelism for this behaviour to occur? How can I have both versions behave
> similary on the same software/hardware configuration so that I can compare?
>
> I'd really appreciate your help on this!
>
> Cheers,
> Jeevan
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>

--
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: [hidden email]
twitter: @papadopoulos_ap
web: http://delab.csd.auth.gr/~apostol


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Parallelism: behavioural difference in version 1.2 and 2.1!?

jeevan.ks
Dear Apostolos,

Thanks for the response!

Our version is built on 2.1, the problem is that the state-of-the-art system I'm trying to compare is built on the version 1.2. So I have to deal with it.

If I understand the level of parallelism correctly, --total-executor-cores is set to the number or workers multiplied by the executor core of each worker, in this case, 32 as well. I make use of the similar script in both the cases, so it shouldn't change.

Thanks and regards,
Jeevan K. Srivatsa


On Wed, 29 Aug 2018 at 16:07, Apostolos N. Papadopoulos <[hidden email]> wrote:
Dear Jeevan,

Spark 1.2 is quite old, and If I were you I would go for a newer version.

However, is there a parallelism level (e.g., 20, 30) that works for both
installations?

regards,

Apostolos



On 29/08/2018 04:55 μμ, jeevan.ks wrote:
> Hi,
>
> I've two systems. One is built on Spark 1.2 and the other on 2.1. I am
> benchmarking both with the same benchmarks (wordcount, grep, sort, etc.)
> with the same data set from S3 bucket (size ranges from 50MB to 10 GB). The
> Spark cluster I made use of is r3.xlarge, 8 instances, 4 cores each, and
> 28GB RAM. I observed a strange behaviour while running the benchmarks and is
> as follows:
>
> - When I ran Spark 1.2 version with default partition number
> (sc.defaultParallelism), the jobs would take forever to complete. So I
> changed it to the number of cores, i.e., 32 times 3 = 96. This did a magic
> and the jobs completed quickly.
>
> - However, when I tried the above magic number on the version 2.1, the jobs
> are taking forever. Deafult parallelism works better, but not that
> efficient.
>
> I'm having problem to rationalise this and compare both the systems. My
> question is: what changes were made from 1.2 to 2.1 with respect to default
> parallelism for this behaviour to occur? How can I have both versions behave
> similary on the same software/hardware configuration so that I can compare?
>
> I'd really appreciate your help on this!
>
> Cheers,
> Jeevan
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>

--
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: [hidden email]
twitter: @papadopoulos_ap
web: http://delab.csd.auth.gr/~apostol


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]