[Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

Varshney, Vaibhav

Hi Spark Experts,

 

We are trying to deploy spark on Kubernetes.

As per doc http://spark.apache.org/docs/latest/running-on-kubernetes.html, it looks like K8s deployment is experimental.

"The Kubernetes scheduler is currently experimental ".

 

Spark 3.0 does not support production deployment using k8s scheduler?

What’s the plan on full support of K8s scheduler?

 

Thanks,

Vaibhav V

Reply | Threaded
Open this post in threaded view
|

Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

srowen
I haven't used the K8S scheduler personally, but, just based on that
comment I wouldn't worry too much. It's been around for several
versions and AFAIK works fine in general. We sometimes aren't so great
about removing "experimental" labels. That said I know there are still
some things that could be added to it and more work going on, and
maybe people closer to that work can comment. But yeah you shouldn't
be afraid to try it.

On Thu, Jul 9, 2020 at 3:18 PM Varshney, Vaibhav
<[hidden email]> wrote:

>
> Hi Spark Experts,
>
>
>
> We are trying to deploy spark on Kubernetes.
>
> As per doc http://spark.apache.org/docs/latest/running-on-kubernetes.html, it looks like K8s deployment is experimental.
>
> "The Kubernetes scheduler is currently experimental ".
>
>
>
> Spark 3.0 does not support production deployment using k8s scheduler?
>
> What’s the plan on full support of K8s scheduler?
>
>
>
> Thanks,
>
> Vaibhav V

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

Varshney, Vaibhav
Thanks for response. We have tried it in dev env. For production, if Spark 3.0 is not leveraging k8s scheduler, then would Spark Cluster in K8s be "static"?
As per https://issues.apache.org/jira/browse/SPARK-24432 it seems it is still blocker for production workloads?

Thanks,
Vaibhav V

-----Original Message-----
From: Sean Owen <[hidden email]>
Sent: Thursday, July 9, 2020 3:20 PM
To: Varshney, Vaibhav (DI SW CAS MP AFC ARC) <[hidden email]>
Cc: [hidden email]; Ramani, Sai (DI SW CAS MP AFC ARC) <[hidden email]>
Subject: Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

I haven't used the K8S scheduler personally, but, just based on that comment I wouldn't worry too much. It's been around for several versions and AFAIK works fine in general. We sometimes aren't so great about removing "experimental" labels. That said I know there are still some things that could be added to it and more work going on, and maybe people closer to that work can comment. But yeah you shouldn't be afraid to try it.

On Thu, Jul 9, 2020 at 3:18 PM Varshney, Vaibhav <[hidden email]> wrote:

>
> Hi Spark Experts,
>
>
>
> We are trying to deploy spark on Kubernetes.
>
> As per doc http://spark.apache.org/docs/latest/running-on-kubernetes.html, it looks like K8s deployment is experimental.
>
> "The Kubernetes scheduler is currently experimental ".
>
>
>
> Spark 3.0 does not support production deployment using k8s scheduler?
>
> What’s the plan on full support of K8s scheduler?
>
>
>
> Thanks,
>
> Vaibhav V

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

Prashant Sharma
Hi,

Whether it is a blocker or not, is upto you to decide. But, spark k8s cluster supports dynamic allocation, through a different mechanism, that is, without using an external shuffle service. https://issues.apache.org/jira/browse/SPARK-27963. There are pros and cons of both approaches. The only disadvantage of scaling without external shuffle service is, when the cluster scales down or it loses executors due to some external cause ( for example losing spot instances), we lose the shuffle data (data that was computed as an intermediate to some overall computation) on that executor. This situation may not lead to data loss, as spark can recompute the lost shuffle data.

Dynamically, scaling up and down scaling, is helpful when the spark cluster is running off, "spot instances on AWS" for example or when the size of data is not known in advance. In other words, we cannot estimate how much resources would be needed to process the data. Dynamic scaling, lets the cluster increase its size only based on the number of pending tasks, currently this is the only metric implemented.

I don't think it is a blocker for my production use cases.

Thanks,
Prashant

On Fri, Jul 10, 2020 at 2:06 AM Varshney, Vaibhav <[hidden email]> wrote:
Thanks for response. We have tried it in dev env. For production, if Spark 3.0 is not leveraging k8s scheduler, then would Spark Cluster in K8s be "static"?
As per https://issues.apache.org/jira/browse/SPARK-24432 it seems it is still blocker for production workloads?

Thanks,
Vaibhav V

-----Original Message-----
From: Sean Owen <[hidden email]>
Sent: Thursday, July 9, 2020 3:20 PM
To: Varshney, Vaibhav (DI SW CAS MP AFC ARC) <[hidden email]>
Cc: [hidden email]; Ramani, Sai (DI SW CAS MP AFC ARC) <[hidden email]>
Subject: Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

I haven't used the K8S scheduler personally, but, just based on that comment I wouldn't worry too much. It's been around for several versions and AFAIK works fine in general. We sometimes aren't so great about removing "experimental" labels. That said I know there are still some things that could be added to it and more work going on, and maybe people closer to that work can comment. But yeah you shouldn't be afraid to try it.

On Thu, Jul 9, 2020 at 3:18 PM Varshney, Vaibhav <[hidden email]> wrote:
>
> Hi Spark Experts,
>
>
>
> We are trying to deploy spark on Kubernetes.
>
> As per doc http://spark.apache.org/docs/latest/running-on-kubernetes.html, it looks like K8s deployment is experimental.
>
> "The Kubernetes scheduler is currently experimental ".
>
>
>
> Spark 3.0 does not support production deployment using k8s scheduler?
>
> What’s the plan on full support of K8s scheduler?
>
>
>
> Thanks,
>
> Vaibhav V
Reply | Threaded
Open this post in threaded view
|

RE: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

Varshney, Vaibhav

Hi Prashant,

 

It sounds encouraging. During scale down of the cluster, probably few of the spark jobs are impacted due to re-computation of shuffle data. This is not of supreme importance for us for now.

Is there any reference deployment architecture available, which is HA , scalable and dynamic-allocation-enabled for deploying Spark on K8s? Any suggested github repo or link?

 

Thanks,

Vaibhav V

 

 

From: Prashant Sharma <[hidden email]>
Sent: Friday, July 10, 2020 12:57 AM
To: [hidden email]
Cc: Sean Owen <[hidden email]>; Ramani, Sai (DI SW CAS MP AFC ARC) <[hidden email]>; Varshney, Vaibhav (DI SW CAS MP AFC ARC) <[hidden email]>
Subject: Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

 

Hi,

 

Whether it is a blocker or not, is upto you to decide. But, spark k8s cluster supports dynamic allocation, through a different mechanism, that is, without using an external shuffle service. https://issues.apache.org/jira/browse/SPARK-27963. There are pros and cons of both approaches. The only disadvantage of scaling without external shuffle service is, when the cluster scales down or it loses executors due to some external cause ( for example losing spot instances), we lose the shuffle data (data that was computed as an intermediate to some overall computation) on that executor. This situation may not lead to data loss, as spark can recompute the lost shuffle data.

 

Dynamically, scaling up and down scaling, is helpful when the spark cluster is running off, "spot instances on AWS" for example or when the size of data is not known in advance. In other words, we cannot estimate how much resources would be needed to process the data. Dynamic scaling, lets the cluster increase its size only based on the number of pending tasks, currently this is the only metric implemented.

 

I don't think it is a blocker for my production use cases.

 

Thanks,

Prashant

 

On Fri, Jul 10, 2020 at 2:06 AM Varshney, Vaibhav <[hidden email]> wrote:

Thanks for response. We have tried it in dev env. For production, if Spark 3.0 is not leveraging k8s scheduler, then would Spark Cluster in K8s be "static"?
As per https://issues.apache.org/jira/browse/SPARK-24432 it seems it is still blocker for production workloads?

Thanks,
Vaibhav V

-----Original Message-----
From: Sean Owen <[hidden email]>
Sent: Thursday, July 9, 2020 3:20 PM
To: Varshney, Vaibhav (DI SW CAS MP AFC ARC) <[hidden email]>
Cc: [hidden email]; Ramani, Sai (DI SW CAS MP AFC ARC) <[hidden email]>
Subject: Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

I haven't used the K8S scheduler personally, but, just based on that comment I wouldn't worry too much. It's been around for several versions and AFAIK works fine in general. We sometimes aren't so great about removing "experimental" labels. That said I know there are still some things that could be added to it and more work going on, and maybe people closer to that work can comment. But yeah you shouldn't be afraid to try it.

On Thu, Jul 9, 2020 at 3:18 PM Varshney, Vaibhav <[hidden email]> wrote:
>
> Hi Spark Experts,
>
>
>
> We are trying to deploy spark on Kubernetes.
>
> As per doc http://spark.apache.org/docs/latest/running-on-kubernetes.html, it looks like K8s deployment is experimental.
>
> "The Kubernetes scheduler is currently experimental ".
>
>
>
> Spark 3.0 does not support production deployment using k8s scheduler?
>
> What’s the plan on full support of K8s scheduler?
>
>
>
> Thanks,
>
> Vaibhav V

Reply | Threaded
Open this post in threaded view
|

Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

Prashant Sharma
Driver HA, is not yet available in k8s mode. It can be a good area, to work. I want to take a look at it. I personally refer to spark official documentation for reference.
Thanks,



On Fri, Jul 10, 2020, 9:30 PM Varshney, Vaibhav <[hidden email]> wrote:

Hi Prashant,

 

It sounds encouraging. During scale down of the cluster, probably few of the spark jobs are impacted due to re-computation of shuffle data. This is not of supreme importance for us for now.

Is there any reference deployment architecture available, which is HA , scalable and dynamic-allocation-enabled for deploying Spark on K8s? Any suggested github repo or link?

 

Thanks,

Vaibhav V

 

 

From: Prashant Sharma <[hidden email]>
Sent: Friday, July 10, 2020 12:57 AM
To: [hidden email]
Cc: Sean Owen <[hidden email]>; Ramani, Sai (DI SW CAS MP AFC ARC) <[hidden email]>; Varshney, Vaibhav (DI SW CAS MP AFC ARC) <[hidden email]>
Subject: Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

 

Hi,

 

Whether it is a blocker or not, is upto you to decide. But, spark k8s cluster supports dynamic allocation, through a different mechanism, that is, without using an external shuffle service. https://issues.apache.org/jira/browse/SPARK-27963. There are pros and cons of both approaches. The only disadvantage of scaling without external shuffle service is, when the cluster scales down or it loses executors due to some external cause ( for example losing spot instances), we lose the shuffle data (data that was computed as an intermediate to some overall computation) on that executor. This situation may not lead to data loss, as spark can recompute the lost shuffle data.

 

Dynamically, scaling up and down scaling, is helpful when the spark cluster is running off, "spot instances on AWS" for example or when the size of data is not known in advance. In other words, we cannot estimate how much resources would be needed to process the data. Dynamic scaling, lets the cluster increase its size only based on the number of pending tasks, currently this is the only metric implemented.

 

I don't think it is a blocker for my production use cases.

 

Thanks,

Prashant

 

On Fri, Jul 10, 2020 at 2:06 AM Varshney, Vaibhav <[hidden email]> wrote:

Thanks for response. We have tried it in dev env. For production, if Spark 3.0 is not leveraging k8s scheduler, then would Spark Cluster in K8s be "static"?
As per https://issues.apache.org/jira/browse/SPARK-24432 it seems it is still blocker for production workloads?

Thanks,
Vaibhav V

-----Original Message-----
From: Sean Owen <[hidden email]>
Sent: Thursday, July 9, 2020 3:20 PM
To: Varshney, Vaibhav (DI SW CAS MP AFC ARC) <[hidden email]>
Cc: [hidden email]; Ramani, Sai (DI SW CAS MP AFC ARC) <[hidden email]>
Subject: Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

I haven't used the K8S scheduler personally, but, just based on that comment I wouldn't worry too much. It's been around for several versions and AFAIK works fine in general. We sometimes aren't so great about removing "experimental" labels. That said I know there are still some things that could be added to it and more work going on, and maybe people closer to that work can comment. But yeah you shouldn't be afraid to try it.

On Thu, Jul 9, 2020 at 3:18 PM Varshney, Vaibhav <[hidden email]> wrote:
>
> Hi Spark Experts,
>
>
>
> We are trying to deploy spark on Kubernetes.
>
> As per doc http://spark.apache.org/docs/latest/running-on-kubernetes.html, it looks like K8s deployment is experimental.
>
> "The Kubernetes scheduler is currently experimental ".
>
>
>
> Spark 3.0 does not support production deployment using k8s scheduler?
>
> What’s the plan on full support of K8s scheduler?
>
>
>
> Thanks,
>
> Vaibhav V