GC Time Error in Client Mode

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

GC Time Error in Client Mode

betaSpark
This post has NOT been accepted by the mailing list yet.
Hi,

I'm using spark for my program in local system. The current cofiguration is 32 cores and 64 GB memory. My datasize is 15 MB. I'm setting the following env parameters:

    spark.driver.memory 2g
    spark.executor.cores 3
    spark.executor.instances 3
    spark.executor.memory 10g
    spark.memory.fraction 0.1
    spark.default.parallelism 10

But I'm getting GC overhead issue, which is making my computation slow. I read [this][1] article and tried to tune my GC accordingly. But still it's throwing this problem.



Can anyone please guide me how to fix it?

Thank you!


  [1]: https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GC Time Error in Client Mode

yncxcw
This post has NOT been accepted by the mailing list yet.
hi,

I think for 4.2h task time, the GC only gets less than 1/8, which is not a big overhead and makes sense to me.

Other suggestions to decrease GC overhead:
(1) increase  "spark.executor.memory" according to your node capacity(configure it to maximize your node capacity before system swapping).
(2) try other GC algorithms(replacing default parallel GC to G1 by configuring  "spark.executor.extraJavaOptions")
(3) decrease " spark.executor.cores" to limit the paralleled tasks, but it may also harm the performance.

BTW, you set a pretty small "spark.memory.fraction", I am doubting if you also have much i/o overhead.



Hope this would help you.


Wei




On Sun, Jul 23, 2017 at 3:55 AM, betaSpark [via Apache Spark User List] <[hidden email]> wrote:
Hi,

I'm using spark for my program in local system. The current cofiguration is 32 cores and 64 GB memory. My datasize is 15 MB. I'm setting the following env parameters:

    spark.driver.memory 2g
    spark.executor.cores 3
    spark.executor.instances 3
    spark.executor.memory 10g
    spark.memory.fraction 0.1
    spark.default.parallelism 10

But I'm getting GC overhead issue, which is making my computation slow. I read [this][1] article and tried to tune my GC accordingly. But still it's throwing this problem.



Can anyone please guide me how to fix it?

Thank you!


  [1]: https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html


If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/GC-Time-Error-in-Client-Mode-tp28923.html
To start a new topic under Apache Spark User List, email [hidden email]
To unsubscribe from Apache Spark User List, click here.
NAML

Loading...