When does SparkContext.defaultParallelism have the correct value?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

When does SparkContext.defaultParallelism have the correct value?

Stephen Coy
Hi there,

I have found that if I invoke

sparkContext.defaultParallelism()

too early it will not return the correct value;

For example, if I write this:

final JavaSparkContext sparkContext = new JavaSparkContext(sparkSession.sparkContext());
final int workerCount = sparkContext.defaultParallelism();

I will get some small number (which I can’t recall right now).

However, if I insert:

sparkContext.parallelize(List.of(1, 2, 3, 4)).collect()

between these two lines I get the expected value being something like node_count * node_core_count;

This seems like a hacky work around solution to me. Is there a better way to get this value initialised properly?

FWIW, I need this value to size a connection pool (fs.s3a.connection.maximum) correctly in a cluster independent way.

Thanks,

Steve C


[http://downloads.ifmsystems.com/data/marketing/images/signatures/driving-force-newsletter.jpg]<https://www.infomedia.com.au/driving-force/?utm_campaign=200630%20Email%20Signature&utm_source=Internal&utm_medium=Email&utm_content=Driving%20Force>
This email contains confidential information of and is the copyright of Infomedia. It must not be forwarded, amended or disclosed without consent of the sender. If you received this message by mistake, please advise the sender and delete all copies. Security of transmission on the internet cannot be guaranteed, could be infected, intercepted, or corrupted and you should ensure you have suitable antivirus protection in place. By sending us your or any third party personal details, you consent to (or confirm you have obtained consent from such third parties) to Infomedia’s privacy policy. http://www.infomedia.com.au/privacy-policy/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: When does SparkContext.defaultParallelism have the correct value?

Sean Owen
If not set explicitly with spark.default.parallelism, it will default
to the number of cores currently available (minimum 2). At the very
start, some executors haven't completed registering, which I think
explains why it goes up after a short time. (In the case of dynamic
allocation it will change over time.) You can set it explicitly to
match what you set the executor count and cores to.

On Mon, Jul 6, 2020 at 10:35 PM Stephen Coy
<[hidden email]> wrote:

>
> Hi there,
>
> I have found that if I invoke
>
> sparkContext.defaultParallelism()
>
> too early it will not return the correct value;
>
> For example, if I write this:
>
> final JavaSparkContext sparkContext = new JavaSparkContext(sparkSession.sparkContext());
> final int workerCount = sparkContext.defaultParallelism();
>
> I will get some small number (which I can’t recall right now).
>
> However, if I insert:
>
> sparkContext.parallelize(List.of(1, 2, 3, 4)).collect()
>
> between these two lines I get the expected value being something like node_count * node_core_count;
>
> This seems like a hacky work around solution to me. Is there a better way to get this value initialised properly?
>
> FWIW, I need this value to size a connection pool (fs.s3a.connection.maximum) correctly in a cluster independent way.
>
> Thanks,
>
> Steve C
>
>
> [http://downloads.ifmsystems.com/data/marketing/images/signatures/driving-force-newsletter.jpg]<https://www.infomedia.com.au/driving-force/?utm_campaign=200630%20Email%20Signature&utm_source=Internal&utm_medium=Email&utm_content=Driving%20Force>
> This email contains confidential information of and is the copyright of Infomedia. It must not be forwarded, amended or disclosed without consent of the sender. If you received this message by mistake, please advise the sender and delete all copies. Security of transmission on the internet cannot be guaranteed, could be infected, intercepted, or corrupted and you should ensure you have suitable antivirus protection in place. By sending us your or any third party personal details, you consent to (or confirm you have obtained consent from such third parties) to Infomedia’s privacy policy. http://www.infomedia.com.au/privacy-policy/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]